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HUMAN SIGNAL PEPTIDE-CONTAINING PROTEINS 

TECHNICAL FIELD 

This invention relates to nucleic acid and amino acid sequences of human signal 
peptide-containing proteins and to the use of these sequences in the diagnosis, treatment, 
and prevention of cancer and immunological disorders. 

BACKGROUND OF THE INVENTION 

Protein transport is an essential process for all living cells. Transport of an 
individual protein usually occurs via an amino-terminal signal sequence which directs, or 
targets, the protein from its ribosomal assembly site to a particular cellular or extracellular 
location. Transport may involve any combination of several of the following steps: contact 
with a chaperone, unfolding, interaction with a receptor and/or a pore complex, addition of 
energy, and refolding. Moreover, an extracellular protein may be produced as an inactive 
precursor. Once the precursor has been exported, removal of the signal sequence by a 
signal peptidase and posttranslational processing (e.g., glycosylation or phosphorylation) 
activates the protein. Signal sequences are common to receptors, matrix molecules (e.g., 
adhesion, cadherin, extracellular matrix, integrin, and selectin), cytokines, hormones, 
growth and differentiation factors, neuropeptides, vasomediators, phosphokinases, 
phosphatases, phospholipases, phosphodiesterases, G and Ras-related proteins, ion 
channels, transporters/pumps, proteases, and transcription factors. 

G-protein coupled receptors (GPCRs) are a superfamily of integral membrane 
proteins which transduce extracellular signals. GPCRs include receptors for biogenic 
amines, e.g., dopamine, epinephrine, histamine, glutamate (metabotropic effect), 
acetylcholine (muscarinic effect), and serotonin; for lipid mediators of inflammation such 
as prostaglandins, platelet activating factor, and leukotrienes; for peptide hormones such as 
calcitonin, C5a anaphylatoxin, follicle stimulating hormone, gonadotropin releasing 
hormone, neurokinin, oxytocin, and thrombin; and for sensory signal mediators, e.g.. 
retinal photopigments and olfactory stimulatory molecules. 

The structure of these highly-conserved receptors consists of seven hydrophobic 
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transmembrane regions, cysteine disulfide badges between the second and third 
extracellular loops, an extracellular N-terminus, and a cytoplasmic C-terminus. Three 
extracellular loops alternate with three intracellular loops to link the seven transmembrane 
regions- The N-terminus interacts with ligands, the disulfide bridge interacts with agonists 
5 and antagonists, and the large third intracellular loop interacts with G proteins to activate 
second messengers such as cyclic AMP (cAMP), phospholipase C, inositol triphosphate, 
or ion channel proteins. The most conserved parts of these proteins are the transmembrane 
regions and the first two cytoplasmic loops. A conserved, acidic-Arg-aromatic triplet 
present in the second cytoplasmic loop may interact with the G proteins. The consensus 

10 pattern, 

[GSTALIVMYWCHGSTANCPDEH 

GSTANC]-[LIVMFYWSTACHDENH]- R-[FYWCSH]-x(2)-[LIVM] is characteristic of 
most proteins belonging to this superfamily. (Watson, S. and Arkinstall, S. (1994) The Q- 
protein Linke d Rece ptor Facts Book . Academic Press, San Diego, CA, pp. 2-6; and 
15 Bolander, F.F. (1994) Molecular E ndocrinology. Academic Press, San Diego, CA, pp. 8- 
19.) 

Tetraspanins are a superfamily of membrane proteins which facilitate the formation 
and stability of cell-surface signaling complexes containing lineage-specific proteins, 
integrins, and other tetraspanins. They are involved in cell activation, proliferation 

20 (including cancer), differentiation, adhesion, and motility. These proteins cross the 

membrane four times, have conserved intracellular N- and C-termini and an extracellular, 
non-conserved hydrophilic domain. Three highly conserved polar amino acids are located 
in the transmembrane domains (TM), an asparagine in TM1 and a glutamate or glutamine 
in TM3 and TM4. Two to three conserved charged residues, including a glutamic acid 

25 residue, are present in the cytoplasmic loop between TM2 and TM3. The extracellular 
loop between TM3 and TM4 contains four conserved cysteine residues: two in a conserved 
CCG motif located about 50 residues C-terminal to TM3; one, often preceded by glycine, 
1 1 residues N-terminal to TM4; and one in the extracellular loop may be found in a PXSC 
motif. Tetraspanins include, e.g., platelet and endothelial cell membrane proteins, 

30 leukocyte surface proteins, tissue specific and tumorous antigens, and the retinitis 

pigmentosa-associated gene penpherin. (Maecker, H.T. et al. ( 1 997) FASEB J. 1 1 :42S- 
442.) Matrix proteins (Mps) function in formation, growth, remodeling and maintenance 
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of tissues and as important mediators and regulators of the inflammatory response. The 
expression and balance of MPs may be perturbed by biochemical changes that result from 
congenital, epigenetic, or infectious diseases. In addition, MPs affect leukocyte migration, 
proliferation, differentiation, and activation in immune response. 
5 MPs encompass a variety of proteins and their functions. Extracellular matrix 

(ECM) proteins are multidomain proteins that play an important role in the diverse 
functions of the ECM. ECM proteins are frequently characterized by the presence of one 
or more domains which may include collagen-like domains, EGF-like domains, 
immunoglobulin-like domains, fibronectin-like domains, vWFA-like modules. (Ayad, S. 

10 et al. (1994) Th^ Extracel lular Matrix Facts Book , Academic Press, San Diego, CA, pp. 2- 
16.) Cell adhesion molecules (CAMs) have been shown to stimulate axonal growth 
through homophilic and/or heterophilic interactions with other molecules. In addition, 
interactions between adhesion molecules and their receptors can potentiate the effects of 
growth factors upon cell biochemistry via shared signaling pathways. (Ruoslahti, E. 

15 (1997) Kidney Int. 51:1413-1417.) Cadherins comprise a family of calcium-dependant 
glycoproteins that function in mediating cell-cell adhesion in solid tissues of multicellular 
organisms. Integrins are ubiquitous transmembrane adhesion molecules that link cells to 
the ECM by interacting with the cytoskeleton. Integrins also function as signal 
transduction receptors and stimulate changes in intracellular calcium levels and protein 

20 kinase activity. (Sjaastad, M.D. and Nelson, W.J. (1997) BioEssays 19:47-55.) Lectins 
are proteins characterized by their ability to bind carbohydrates on cell membranes by 
means of discrete, modular carbohydrate recognition domains, CRDs. (Kishore, U. et al. 
(1997) Matrix Biol. 15:583-592.) Certain cytokines and membrane-spanning proteins 
have CRDs which may enhance interactions with extracellular or intracellular ligands, 

25 with proteins in secretory pathways, or with molecules in signal transduction pathways. 
The lipocalin superfamily constitutes a phylogenetically conserved group of more than 
forty proteins that function by binding to and transporting a variety of physiologically 
important ligands. Members of this family function as carriers of retinoids, odorants, 
chromophores, pheromones, and sterols, and a subset of these proteins may be 

30 multifunctional, serving as either a biosynthetic enzyme or as a specific enzyme inhibitor. 
(Tanaka. T. et al. (1997) J. Biol. Chem. 272:15789-15795; and van't Hof. W. et al. (1997; 
J. Biol. Chem. 272: 1 837- 1 841 .) Selecrins are a family of calcium ion-dependent lectins 
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expressed on inflamed vascular endothelium and the surface of some leukocytes. They 
mediate rolling movement and adhesive contacts between blood cells and blood vessel 
walls. The structure of the selectins and their ligands supports the type of bond formation 
and dissociation that allows a cell to roll under conditions of flow. (Rossiter, H. et al. 
5 (1997) Mol. Med. Today 3:214-222.) 

Protein kinases regulate many different cell proliferation, differentiation, and 
signaling processes by adding phosphate groups to proteins. Reversible protein 
phosphorylation is a key strategy for controlling protein functional activity in eukaryotic 
cells. The high energy phosphate which drives this activation is generally transferred from 

10 adenosine triphosphate molecules (ATP) to a particular protein by protein kinases and 
removed from that protein by protein phosphatases. Phosphorylation occurs in response to 
extracellular signals, cell cycle checkpoints, and environmental or nutritional stresses. 
Protein kinases may be roughly divided into two groups; protein tyrosine kinases (PTKs) 
which phosphorylate tyrosine residues, and serine/threonine kinases (STKs) which 

1 5 phosphorylate serine or threonine residues. A few protein kinases have dual specificity. A 
majority of kinases contain a similar 250-300 amino acid catalytic domain which can be 
further divided into eleven subdomains. The N-terminal domain, which contains 
subdomains I to IV, generally folds into a two-lobed structure which binds and orients the 
ATP (or GTP) donor molecule. The larger C terminal domain, which contains 

20 subdomains VIA to XI, binds the protein substrate and carries out the transfer of the 
gamma phosphate from ATP to the hydroxyl group of the target amino acid residue. 
Subdomain V links the two domains. Each of the 1 1 subdomains contain specific residues 
and motifs that are characteristic and are highly conserved. (Hardie, G. and Hanks, S. 
(1 995) The Protein Kjna^ Facts Book . Vol I, pp. 7-47, Academic Press, San Diego, CA.) 



25 



Protein phosphatases remove phosphate groups from molecules previously 
modified by protein kinases thus participating in cell signaling, proliferation, 
differentiation, contacts, and oncogenesis. Protein phosphorylation is a key strategy- used 
to control protein functional activity in eukaryotic cells. The high energy phosphate is 
30 transferred from ATP to a protein by protein kinases and removed by protein 

phosphatases. There appear to be three, evolutionarily-distinct protein phosphatase gene 
families: protein phosphatases (PPs); protein tyrosine phosphatases (PTPs); and 




WO 99/33981 PCT/US98/27598 
acid/alkaline phosphatases (APs). PPs dephosphorylate phosphoserine/threonine residues 
and are an important regulator of many cAMP mediated, hormone responses in cells. 
PTPs reverse the effects of protein tyrosine kinases and therefore play a significant role in 
cell cvcle and cell signaling processes. Although APs dephosphorylate substrates in vitro . 
5 their role in vivo is not well known. (Carbonneau, H. and Tonks, N.K. (1992) Annu. Rev. 
Cell Biol. 8:463-493.) 

Protein phosphatase inhibitors control the activities of specific phosphatases. A 
specific inhibitor of PP-1, 1-1, has been identified that when phosphorylated by cAMP- 
dependent protein kinase (PKA) specifically binds to PP-1 and inhibits its activity. Since 

10 PP-I is dephosphoryles many of the proteins phosphorylated by PKA, activation of 1-1 by 
PKA serves to amplify the effects of PKA and the many cAMP-dependent responses 
mediated by PKA. In addition, since PP-I also dephosphorylates many phosphoproteins 
that are not phosphorylated by PKA, 1-1 activation serves to exert cAMP control over 
other protein phosphorylations. I.PP2A is a specific and potent inhibitor of PP-I1A. (Li, 

15 M. et al. (1996) Biochemistry 35:6998-7002.) Since PP-IIA is the main phosphatase 
responsible for reversing the phosphorylations of serine/ threonine kinases, 1,PP2A has 
broad effects in controlling protein phosphorylations. 

Cyclic nucleotides (cAMP and cGMP) function as intracellular second messengers 
to transduce a variety of extracellular signals, including hormones, and light and 

20 neurotransmitters. Cyclic nucleotide phosphodiesterases (PDEs) degrade cyclic 

nucleotides to their corresponding monophosphates, thereby regulating the intracellular 
concentrations of cyclic nucleotides and their effects on signal transduction. At least 
seven families of mammalian PDEs have been identified based on substrate specificity and 
affinity, sensitivity to cofactors and sensitivity to inhibitory drugs. (Beavo, J. A. (1995) 

25 Physiological Reviews 75: 725-748.) PDEs are composed of a catalytic domain of -270 
amino acids, an N-terminal regulatory domain responsible for binding cofactors and, in 
some cases, a C-terminal domain with unknown function. Within the catalytic domain, 
there is approximately 30% amino acid identity between PDE families and -85-95% 
identity between isozymes of the same family. Furthermore, within a family there is 

30 extensive similarity (>60%) outside the catalytic domain, while across families there is 
little or no sequence similarity. A variety of diseases have been attributed to increased 
PDE activity and inhibitors of PDEs have been used effectively as anti-inflammatory. 
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antihypertensive, and antithrombotic agents. (Verghese, M.W. et al. (1995) Mol. 
Pharmacol. 47:1 164-1 171; and Banner, K.H.. and Page, CP. (1995) Eur. Respir. J. 
8:996-1000.) 

Phospholipases (PLs) are enzymes that catalyze the removal of fatty acid residues 
5 from phosphoglycerides. PLs play an important role in transmembrane signal transduction 
and are named according to the specific ester bond in phosphoglycerides that is 
hydrolyzed, i.e., A„ A 2 , C or D. PLA 2 cleaves the ester bond at position 2 of the glycerol 
moiety of membrane phospholipids giving rise to arachidonic acid. Arachidonic acid is 
the common precursor to four major classes of eicosanoids; prostaglandins, prostacyclins, 

10 thromboxanes and leukotrienes. Eicosanoids are signaling molecules involved in the 
contraction of smooth muscle, platelet aggregation, and pain and inflammatory responses. 
PLC is an important link in certain receptor-mediated, signaling transduction pathways. 
Extracellular signaling molecules including hormones, growth factors, neurotransmitters, 
and immunoglobulins bind to their respective cell surface receptors and activate PLC. 

1 5 Activated PLC generates second messenger molecules from the hydrolysis of inositol 
phospholipids that regulate cellular processes, e.g., secretion, neural activity, metabolism 
and proliferation. (Alberts, B. et al. (1994) Molecular Biology of The Cell, Garland 
Publishing, Inc., New York, NY, pp. 85, 21 1, 239-240, 642-645.) 

The nucleotide cyclases, i.e., adenylate and guanylate cyclase, catalyze the 

20 synthesis of the cyclic nucleotides, cAMP and cGMP, from ATP and GTP, respectively. 
They act in concert with phosphodiesterases, which degrade cAMP and cGMP, to regulate 
the cellular levels of these molecules and their functions. cAMP and cGMP function as 
intracellular second messengers to transduce a variety of extracellular signals, e.g., 
hormones, and light and neurotransmitters. Adenylate cyclase is a plasma membrane 

25 protein that is coupled with various hormone receptors also located on the plasma 
membrane. Binding of a hormone to its receptor activates adenylate cyclase which, in 
tum, increases the levels of cAMP in the cytosol. The activation of other molecules by 
cAMP leads to the cellular effect of the hormone. In a similar manner, guanylate cyclase 
participates in the process of visual excitation and phototransduction in the eye. (Stryer, 

30 L. (1988) Biochemistry W.H. Freeman and Co.. New York, pp. 975-980. 1029-1035.) 
Cytokines are produced in response to cell perturbation. Some cytokines are produced as 
precursor forms, and some form multimers in order to become active. They are produced 
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in groups and in patterns characteristic of the particular stimulus or disease, and the 
members of the group interact with one another and other molecules to produce an overall 
biological response. Interleukins, neurotrophins, growth factors, interferons, and 
chemokines are all families of cytokines which work in conjunction with cellular receptors 
5 to regulate cell proliferation and differentiation and to affect such activities, e.g., leukocyte 
migration and function, hematopoietic cell proliferation, temperature regulation, acute 
response to infections, tissue remodeling, and cell survival. Studies using antibodies or 
other drugs that modify the activity of a particular cytokine are used to elucidate the roles 
of individual cytokines in pathology and physiology. 
10 Chemokines are a small chemoartractant cytokines which are active in leukocyte 

trafficking. Initially, chemokines were isolated and purified from inflamed tissues, but 
recently several chemokines have been discovered through molecular cloning techniques. 
Chemokines have been shown to be active in cell activation and migration, angiogenic and 
angiostatic activities, suppression of hematopoiesis, HIV infectivity, and promoting Th- 
15 l(IL-2-, interferon y-stimulated) cytokine release. 

Chemokines generally contain 70-100 amino acids and are subdivided into four 
subfamilies based on the presence and arrangement of conserved CXC, CC, CX3C and C 
motifs. The CXC (alpha), CC (beta), and CX3C chemokines contain four conserved 
cysteines. The CC subfamily is active on monocytes, lymphocytes, eosinophils, and mast 
20 cells; the CXC subfamily, on neutrophils; CX3C and C subfamilies, on T-cells. Many of 
the CC chemokines have been characterized functionally as well as structurally. (Callard, 
R. and Gearing, A. (1994) The Cytokine Facts Book r Academic Press, New York, NY, pp. 
181-190,210-213, 223-227.) 



25 secreted from the cell, some factors require oligomerization or association with ECM in 
order to function. Complex interactions among these factors and their receptors result in 
the stimulation or inhibition of cell division, cell differentiation, cell signaling, and cell 
motility. Some factors act on their cell of origin (autocrine signaling); on neighboring 
cells (paracrine signaling); or on distant cells (endocrine signaling). 

30 There are three broad classes of growth and differentiation factors. The first class 

includes the large polypeptide growth factors, e.g., epidermal growth factor, fibroblast 
growth factor, transforming growth factor, insulin-like growth factor, and platelet-derived 



Growth and differentiation factors function in intercellular communication. Once 
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growth factor. Each of these defines a family of related molecules which stimulate cell 
proliferation for wound healing, bone synthesis and remodeling, and regeneration of 
epithelial, epidermal, and connective tissues, and induce differentiation of embryonic 
tissues. Nerve growth factor functions specifically as a neurotrophic factor, and all induce 
5 differentiation of embryonic tissues. The second class includes the hematopoietic growth 
factors which stimulate the proliferation and differentiation of blood cells such as B- 
lymphocytes, T-lymphocytes, erythrocytes, platelets, eosinophils, basophils, neutrophils, 
macrophages, and their stem cell precursors. These factors include colony-stimulating 
factors, erythropoietin, and cytokines, e.g., interleukins, interferons (IFNs), and tumor 
I o necrosis factor (TNF). Cytokines are secreted by cells of the immune system and function 
in immunomodulation. The third class includes small peptide factors e.g., bombesin, 
vasopressin, oxytocin, endothelin, transferrin, angiotensin II, vasoactive intestinal peptide, 
and bradykinin, which function as hormones to regulate cellular functions other than 
proliferation. 

1 5 Growth and differentiation factors have been shown to play critical roles in 

neoplastic transformation of cells in Yilm and in tumor progression in vivo.. Inappropriate 
expression of growth factors by tumor cells may contribute to vascularization and 
metastasis of melanotic tumors. In hematopoiesis, growth factor misregulation can result 
in anemias, leukemias and lymphomas. Certain growth factors, e.g., I FN, are cytotoxic to 

20 tumor cells both in vivo and in vitro. Moreover, growth factors and/or their receptors are 
related both structurally and functionally related to oncoproteins. In addition, growth 
factors affect transcriptional regulation of both proto-oncogenes and oncosuppressor 
genes. (Pimentel, E. (1994) Handbook of G rowth Factors. CRC Press, Ann Arbor, MI, pp. 
6-25.) 

25 Proteolytic enzymes or proteases degrade proteins by reducing the activation 

energy needed for the hydrolysis of peptide bonds. The major families are the zinc, serine, 
cysteine, thiol, and carboxyl proteases. 

Zinc proteases, e.g., carboxypeptidase A, have a zinc ion bound to the active site, 
recognize C-terpiinal residues that contain an aromatic or bulky aliphatic side chain, and 
hydrolyze the peptide bond adjacent to the C-terminal residues. Serine proteases have an 
active sue serine residue and include digestive enzymes, e.g.. trypsin and chymotrypsin, 
components of the complement and blood-clotting cascades, and enzymes that control the 



30 



WO 99/33981 PCT7US98/27598 
deeradation and turnover of extracellular matrix (ECM) molecules. Subfamilies of serine 
proteases include tryptases (cleavage after arginine or lysine), aspases (cleavage after 
aspartate), chymases (cleavage after phenylalanine or leucine), metases (cleavage after 
methionine), and serases (cleavage after serine). Cysteine proteases (e.g. cathepsin) are 
5 produced by monocytes, macrophages and other immune cells and are involved in diverse 
cellular processes ranging from the processing of precursor proteins to intracellular 
degradation. Overproduction of these enzymes can cause the tissue destruction associated 
with rheumatoid arthritis and asthma. Thiol proteases, e.g., papain, contain an active site 
cysteine and are widely distributed within tissues. Thiol proteases effect catalysis through 

10 a thiol ester intermediate facilitated by a proximal histidine side chain. Carboxyl 

proteases, e.g., pepsin, are active only under acidic conditions (pH 2 to 3). The active site 
of pepsin contains two aspartate residues; when one aspartate is ionized and the other is 
not, the enzyme is active. A common feature of the carboxyl proteases is that they are 
inhibited by very low concentrations (10 10 M) of the inhibitor pepstatin. A substrate 

1 5 analog which induces structural changes at the active site of a protease functions as an 
antagonist or inhibitor. 

Guanosine triphosphate-binding proteins (G proteins) participate in intracellular 
signal transduction and control regulatory pathways through cell surface receptors. These 
receptors respond to hormones, growth factors, neuromodulators, or other signaling 

20 molecules, by binding GTP. Binding of GTP leads to the production of cAMP which 
controls phosphorylation and activation of other proteins. During this process, the 
hydrolysis of GTP acts as an energy source as well as an on-off switch for the GTPase 
activity. 

The G proteins are small proteins which consist of single 21-30 kDa polypeptides. 
25 They can be classified into five subfamilies: Ras, Rho, Ran, Rab, and ADP-ribosylation 
factor. These proteins regulate cell growth, cell cycle control, protein secretion, and 
intracellular vesicle interaction. In particular, the Ras proteins are essential in transducing 
signals from receptor tyrosine kinases to serine/threonine kinases which control cell 
growth and differentiation. Mutant Ras proteins, which bind but can not hydrolyze GTP. 
30 are permanently activated and cause continuous cell proliferation or cancer. 

All five subfamilies share common structural features and four conserved motifs. I 
to IV. Motif I is the most variable and has the signature of GXXXXGK, in which lysine 
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interacts with the P- and y -phosphate groups of GTP. Motif II, III, and IV have 
DTAGQE, NKXD, and EXSAX as their respective signatures and regulate the binding of 
g-phosphate, GTP, and the guanine base of GTP, respectively. Most of the membrane- 
bound G proteins require a carboxy terminal isoprenyl group (CAAX), added 
5 posttranslationally, for membrane association and biological activity. The G proteins also 
have a variable effector region, located between motifs I and II, which is characterized as 
the interaction site for guanine nucleotide exchange factors or GTPase-activating proteins. 

Eukaryotic cells are bound by a membrane and subdivided into membrane bound 
compartments. As membranes are impermeable to many ions and polar molecules, 

10 transport of these molecules is mediated by ion channels, ion pumps, transport proteins, or 
pumps. Symponers and antiporters regulate cytosolic pH by transporting ions and small 
molecules, e.g., amino acids, glucose, and drugs, across membranes; symporters transport 
small molecules and ions in the same direction, and antiporters, in the opposite direction. 
Transporter superfamilies include facilitative transporters and active ATP binding cassette 

! 5 transporters involved in multiple-drug resistance and the targeting of antigenic peptides to 
MHC Class I molecules. These transporters bind to a specific ion or other molecule and 
undergo conformational changes in order to transfer the ion or molecule across a 
membrane. Transport can occur by a passive, concentration-dependent mechanism or can 
be linked to an energy source such as ATP hydrolysis or an ion gradient. 

20 Ion channels are formed by transmembrane proteins which form a lined 

passageway across the membrane through which water and ions, e.g., NV, K\ Ca 2 ", and 
Cl\ enter and exit the cell. For example, chloride channels are involved in the regulation 
of the membrane electric potential as well as absorption and secretion of ions across the 
membrane. In intracellular membranes of the Golgi apparatus and endocytic vesicles, 

25 chloride channels also regulate organelle pH. Electrophysiological and pharmacological 
studies suggest that a variety of chloride channels exist in different cell types and that 
many of these channels have one or more protein kinase phosphorylation sites. 

Ion pumps are ATPases which actively maintain membrane gradients. Ion pumps 
can be grouped into three classes, e.g., P, V, and F, according to their structure and 

30 function. All have one or more binding sites for ATP on the cytosolic face of the 

membrane. The P-class ion pumps consist of two a and two P transmembrane subunits. 
include Ca 2 ^ ATPase and NV/TC ATPase. and function in transporting H\ NV, K~. and 
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Ca 2 ^ ions. The V- and F-class ion pumps have similar structures, a cytosohc domain 
formed by ar least five extrinsic polypeptides and at least 2 transmembrane proteins, and 
only transport FT. F class FT pumps have been identified from the membranes of 
mitochondria and chloroplast, and V-class H + pumps regulate acidity inside lysosomes, 
5 endosomes, and plant vacuoles. 

A family of structurally related intrinsic membrane proteins known as facilitative 
glucose transporters catalyze the movement of glucose and other selected sugars across the 
plasma membrane. The proteins in this family contain a highly conserved, large 
transmembrane domain made of 12 transmembrane a-helices, and several less conserved, 

10 asymmetric, cytoplasmic and exoplasmic domains. (Pessin, J. E., and Bell, G.I. (1992) 
Annu. Rev. Physiol. 54:91 1-930.) 

Amino acid transport is mediated by Na + dependent amino acid transporters. 
These transporters are involved in gastrointestinal and renal uptake of dietary and cellular 
amino acids and the re-uptake of neurotransmitters. Transport of cationic amino acids is 

15 mediated by the system y+ family members and the cationic amino acid transporter (CAT) 
family. Members of the CAT family share a high degree of sequence homology, and each 
contains 12-14 putative transmembrane domains. (Ito, K. and Groudine, M. (1997) J. 
Biol. Chem. 272:26780-26786.) 

Proton-coupled, 12 membrane-spanning domain transporters such as PEPT 1 and 

20 PEPT 2 are responsible for gastrointestinal absorption and for renal reabsorbtion of 
peptides using an electrochemical IT gradient as the driving force. A heterodimeric 
peptide transporter, consisting of TAP 1 and TAP 2, is associated with antigen processing. 
Peptide antigens are transported across the membrane of the endoplasmic reticulum so 
they can be presented to the major histocompatibility complex class I molecules. Each 

25 TAP protein consists of multiple hydrophobic membrane spanning segments and a highly 
conserved ATP-binding cassette. (Boll, M. et al. (1996) Proc. Natl. Acad. Sci. 
93:284-289.) 

Hormones are secreted molecules that circulate in the body fluids and bind to 
specific receptors on the surface of. or within, target tissue cells. Although they have 
30 diverse biochemical compositions and mechanisms of action, hormones can be grouped 
into two categories. One category consists of small lipophilic molecules that diffuse 
through the plasma membrane of target cells, bind to cytosolic or nuclear receptors, and 
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form a complex alters gene expression. Examples of this category' include retinoic acid, 
thyroxine, and the cholesterol derived steroid hormones, progesterone, estrogen, 
testosterone, Cortisol, and aldosterone. These hormones have a long half-life, e.g., several 
hours to days, and long-term effects of their target cells. Their solubility in the blood may 
5 be increased by their association with carrier molecules. Within the target cell nucleus, 
hormone/receptor complexes bind to specific response elements in target gene regulatory 
regions. 

A second category consists of hydrophilic hormones that function by binding to 
cell surface receptors and transducing the signal across the plasma membrane. Examples 

10 of this category include amino acid derivatives, such as catecholamines, e.g., epinephrine, 
norepinephrine, and histamine; peptide hormones, e.g., glucagon, insulin, gastrin, secretin, 
cholecystokinin, adrenocorticotropic hormone, follicle stimulating hormone, luteinizing 
hormone, thyroid stimulating hormone, parathormone, and vasopressin. Peptide hormones 
are synthesized as inactive forms and stored in secretory vesicles. These hormones are 

1 5 activated by protease cleavage before being released from the cell. Many hydrophilic 
hormones have a very short half-life and effect, e.g., seconds to hours, and are inactivated 
by proteases in the blood. (Lodish et al. (1995) Molecular Cell Biology, Scientific 
American Books Inc., New York, NY, pp. 856-864.) 

Neuropeptides and vasomediators (NP/VM) comprise a large family of endogenous 

20 signaling molecules. Included in the family are neurotransmitters such as bombesin, 
neuropeptide Y, neurotensin, neuromedin N, melanocortins, opioids, e.g., enkephalins, 
endorphins and dynorphins, galanin, somatostatin, tachykinins, vasopressin, and 
vasoactive intestinal peptide, and circulatory system-borne signaling molecules, e.g., 
angiotensin, complement, calcitonin, endothelins, formyl-methionyl peptides, glucagon, 

25 cholecystokinin and gastrin. These proteins are synthesized as "pre-pro" molecules, and 
are activated and inactivated by proteolytic cleavage. NP/VMs can transduce signals 
directly, modulate the activity or release of other neurotransmitters and hormones, and act 
as catalytic enzymes in cascades. The effects of NP/VMs range from extremely brief or 
lons-lasting (melanocortin-mediated changes in skin melanin). Regulatory molecules 

30 turn individual genes or groups of genes on and off in response to various inductive 

mechanisms of the cell or organism; act as transcription factors by determining whether or 
not transcription is initiated, enhanced, or repressed; and splice transcripts as dictated in a 
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particular cell or tissue. Although they interact with short stretches of DNA scattered 
throughout the entire genome, most gene expression is regulated near the site at which 
transcription starts or within the open reading frame of the gene being expressed. The 
regulated stretches of the DNA can be simple and interact with only a single protein, or 
5 they can require several proteins acting as part of a complex to regulate gene expression. 
The external features of the double helix which provide recognition sites are hydrogen 
bond donor and acceptor groups, hydrophobic patches, major and minor grooves, and 
regular, repeated stretches of sequences which cause distinct bends in the helix. The 
surface features of the regulatory molecule are complementary to those of the DNA. 

10 Many of the transcription factors incorporate one of a set of DNA-binding 

structural motifs, each of which contains either a helices or 6 sheets and binds to the major 
groove of DNA. Seven of the structural motifs common to transcription factors are helix- 
turn-helix, homeodomains, zinc finger, steroid receptor, fi sheets, leucine zipper, and helix- 
loop-helix. (Pabo, CO. and R.T. Sauer (1992) Ann. Rev. Biochem. 61 : 1053-95.) Other 

15 domains of transcription factors may form crucial contacts with the DNA. In addition, 
accessory proteins provide important interactions which may convert a particular protein 
complex to an activator or a repressor or may prevent binding. (Alberts, B. et al. (1994) 
Molecular Biolnpv of the Cell . Garland Publishing Co, New York, NY pp. 401-474.) 
The discovery of new human signal peptide-containing proteins and the 

20 polynucleotides encoding these molecules satisfies a need in the art by providing new 
compositions which are useful in the diagnosis, treatment, and prevention of cancer and 
immunological disorders. 



SUMMARY OF THE INVENTION 

The invention features a substantially purified human signal peptide-containing 
protein (SIGP), having an amino acid sequence selected from the group consisting of SEQ 
ID NO: 1 SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, SEQ ID NO:6, 
SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 10, SEQ ID NO: 1 I , SEQ ID 
NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, and SEQ IDNO:15. 

The invention farther provides isolated and substantially purified polynucleotides 
encoding SIGP. In a particular aspect, the polynucleotide has a nucleic acid sequence 
selected from the group consisting of SEQ ID NO: 1 6, SEQ ID NO: 1 7. SEQ ID NO: 1 8. 
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SEQ ID NO: 19, SEQ ID NO:20, SEQ ID NO:2 1 , SEQ ID NO:22, SEQ ID NO:23, SEQ 
ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID 
NO:29, and SEQ ID NO:30. 

In addition, the invention provides a polynucleotide, or fragment thereof, which 
5 hybridizes to any of the polynucleotides encoding an SIGP selected from the group 

consisting of SEQ ID NO: 1 SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, 
SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO: 1 0. SEQ ID 
NO: 1 1 , SEQ ID NO: 1 2, SEQ ID NO: 1 3, SEQ ID NO: 1 4, and SEQ ID NO: 1 5. In another 
aspect, the invention provides a composition comprising isolated and purified 
! o polynucleotides selected from the group consisting of SEQ ID NO: 1 6, SEQ ID NO: 1 7, 
SEQ ID NO: 1 8, SEQ ID NO: 1 9, SEQ ID NO:20, SEQ ID NO:2 1 , SEQ ID NO:22, SEQ 
ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID 
NO:28, SEQ ID NO:29, and SEQ ID NO:30, or a fragment thereof. 

The invention further provides a polynucleotide comprising the complement, or 
1 5 fragments thereof, of any one of the polynucleotides encoding SIGP. In another aspect, 
the invention provides compositions comprising isolated and purified polynucleotides 
comprising the complement of SEQ ID NO: 1 6, SEQ ID NO: 1 7, SEQ ID NO: 1 8, SEQ ID 
NO: 19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, 
SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, and 
20 SEQ ID NO:30, or fragments thereof. 

The present invention further provides an expression vector containing at least a 
fragment of any one of the polynucleotides selected from the group consisting of SEQ ID 
NO: 1 6, SEQ ID NO: 1 7, SEQ ID NO: 1 8, SEQ ID NO: 1 9, SEQ ID NO:20, SEQ ID NO:2 1, 
SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ 
25 ID NO:27, SEQ ID NO:28, SEQ ID NO:29, and SEQ ID NO:30. In yet another aspect, 
the expression vector containing the polynucleotide is contained within a host cell. 

The invention also provides a method for producing a polypeptide or a fragment 
thereof, the method comprising the steps of: (a) culturing the host cell containing an 
expression vector containing at least a fragment of a polynucleotide encoding SIGP under 
30 conditions suitable for the expression of the polypeptide; and (b) recovering the 
polypeptide from the host cell culture. 

The invention also provides a pharmaceutical composition comprising a 
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substantially purified SIGP in conjunction with a suitable pharmaceutical carrier. 

The invention further includes a purified antibody which binds to SIGP, as well as 
a purified agonist and a purified antagonist of SIGP. 

The invention also provides a method for treating or preventing a cancer associated 
5 with the decreased expression or activity of SIGP, the method comprising the step of 
administering to a subject in need of such treatment an effective amount of a 
pharmaceutical composition containing SIGP. 

The invention also provides a method for treating or preventing a cancer associated 
with the increased expression or activity of SIGP, the method comprising the step of 
10 administering to a subject in need of such treatment an effective amount of an antagonist 
of SIGP. 

The invention also provides a method for treating or preventing an immune 
response associated with the increased expression or activity of SIGP, the method 
comprising the step of administering to a subject in need of such treatment an effective 

1 5 amount of an antagonist of SIGP. 

The invention also provides a method for detecting a nucleic acid sequence which 
encodes a human regulatory proteins in a biological sample, the method comprising the 
steps of: a) hybridizing a nucleic acid sequence of the biological sample to a 
polynucleotide sequence complementary to the polynucleotide encoding SIGP, thereby 

20 forming a hybridization complex; and b) detecting the hybridization complex, wherein the 
presence of the hybridization complex correlates with the presence of the nucleic acid 
sequence encoding the human regulatory protein in the biological sample. 

The invention also provides a microarray containing at least a fragment of at least 
one of the polynucleotides encoding a polypeptide having an amino acid sequence selected 

25 from the group consisting of SEQ ID NO: 1 SEQ ID NO:2, SEQ ID NO:3 , SEQ ID NO:4, 
SEQ ID NO:5, SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID 
NO:10, SEQ ID NO:l 1, SEQ ID NO: 12, SEQ ID NO: 13, SEQ ID NO: 14, and SEQ ID 
NO:15. 

The invention also provides a method for detecting the expression level of a 
30 nucleic acid encoding a human regulatory protein in a biological sample, the method 

comprising the steps of hybridizing the nucleic acid sequence of the biological sample to a 
complementary polynucleotide, thereby forming hybridization complex; and determining 
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expression of the nucleic acid sequence encoding a human regulatory protein in the 
biological sample by identifying the presence of the hybridization complex. In a preferred 
embodiment, prior to the hybridizing step, the nucleic acid sequences of the biological 
sample are amplified and labeled by the polymerase chain reaction. 

5 

DESCRIPTION OF THE INVENTION 

Before the present proteins, nucleotide sequences, and methods are described, it is 
understood that this invention is not limited to the particular methodology, protocols, cell 
lines, vectors, and reagents described, as these may vary. It is also to be understood that 

10 the terminology used herein is for the purpose of describing particular embodiments only, 
and is not intended to limit the scope of the present invention which will be limited only 
by the appended claims. 

It must be noted that as used herein and in the appended claims, the singular forms 
"a," "an," and "the" include plural reference unless the context clearly dictates otherwise. 

; 5 Thus, for example, a reference to "a host cell" includes a plurality of such host cells, and a 
reference to "an antibody" is a reference to one or more antibodies and equivalents thereof 
known to those skilled in the art, and so forth. 

Unless defined otherwise, all technical and scientific terms used herein have the 
same meanings as commonly understood by one of ordinary skill in the art to which this 

20 invention belongs. Although any methods and materials similar or equivalent to those 
described herein can be used in the practice or testing of the present invention, the 
preferred methods, devices, and materials are now described. All publications mentioned 
herein are cited for the purpose of describing and disclosing the cell lines, vectors, and 
methodologies which are reported in the publications and which might be used in 

25 connection with the invention. Nothing herein is to be construed as an admission that the 
invention is not entitled to antedate such disclosure by virtue of prior invention. 

DEFINITIONS 

"SIGP," as used herein, refers to the amino acid sequences of substantially purified 
30 SIGP obtained from any species, particularly a mammalian species, including bovine, 
ovine, porcine, murine, equine, and preferably the human species, from any source, 
whether natural, synthetic, semi-synthetic, or recombinant. 
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The term '"agonist/' as used herein, refers to a molecule which, when bound to 
SIGP, increases or prolongs the duration of the effect of SIGP. Agonists may include 
proteins, nucleic acids, carbohydrates, or any other molecules which bind to and modulate 
the effect of SIGP. 

An "allele" or an "allelic sequence," as these terms are used herein, is an 
alternative form of die gene encoding SIGP. Alleles may result from at least one mutation 
in the nucleic acid sequence and may result in altered mRNAs or in polypeptides whose 
structure or function may or may not be altered. Any given natural or recombinant gene 
may have none, one, or many allelic forms. Common mutational changes which give rise 
to alleles are generally ascribed to natural deletions, additions, or substitutions of 
nucleotides. Each of these types of changes may occur alone, or in combination with the 
others, one or more times in a given sequence. 

"Altered" nucleic acid sequences encoding SIGP, as described herein, include those 
sequences with deletions, insertions, or substitutions of different nucleotides, resulting in a 
polynucleotide the same SIGP or a polypeptide with at least one functional characteristic 
of SIGP. Included within this definition are polymorphisms which may or may not be 
readily detectable using a particular oligonucleotide probe of the polynucleotide encoding 
SIGP, and improper or unexpected hybridization to alleles, with a locus other than the 
normal chromosomal locus for the polynucleotide sequence encoding SIGP. The encoded 
protein may also be "altered," and may contain deletions, insertions, or substitutions of 
amino acid residues which produce a silent change and result in a functionally equivalent 
SIGP. Deliberate amino acid substitutions may be made on the basis of similarity in 
polarity, charge, solubility, hydrophobicity, hydrophilicity, and/or the amphipathic nature 
of the residues, as long as the biological or immunological activity of SIGP is retained. 
For example, negatively charged amino acids may include aspartic acid and glutamic acid, 
positively charged amino acids may include lysine and arginine, and amino acids with 
uncharged polar head groups having similar hydrophilicity values may include leucine, 
isoleucine, and valine; glycine and alanine; asparagine and glutamine; serine and 
threonine; and phenylalanine and tyrosine. 

The terms "amino acid" or "amino acid sequence." as used herein, refer to an 
oligopeptide, peptide, polypeptide, or protein sequence, or a fragment of any of these, and 
to naturally occurring or synthetic molecules. In this context, ''fragments", "immunogenic 
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fragments", or "antigenic fragments" refer to fragments of SIGP which are preferably 
about 5 to about 15 amino acids in length and which retain some biological activity or 
immunological activity of SIGP. Where "amino acid sequence" is recited herein to refer 
to an amino acid sequence of a naturally occurring protein molecule, "amino acid 
5 sequence" and like terms are not meant to limit the amino acid sequence to the complete 
native amino acid sequence associated with the recited protein molecule. 

"Amplification," as used herein, relates to the production of additional copies of a 
nucleic acid sequence. Amplification is generally carried out using polymerase chain 
reaction (PCR) technologies well known in the art. (See, e.g., Dieffenbach, C.W. and 

10 G.S. Dveksler ( 1 995) PCR Primer, a laboratory Manual. Cold Spring Harbor Press, 
Plainview, NY, pp. I -5.) 

The term "antagonist," as it is used herein, refers to a molecule which, when bound 
to SIGP, decreases the amount or the duration of the effect of the biological or 
immunological activity of SIGP. Antagonists may include proteins, nucleic acids, 

j 5 carbohydrates, antibodies, or any other molecules which decrease the effect of SIGP. 

As used herein, the term "antibody" refers to intact molecules as well as to 
fragments thereof, such as Fa, F(ab') 2 , and Fv fragments, which are capable of binding the 
epitopic determinant. Antibodies that bind SIGP polypeptides can be prepared using intact 
polypeptides or using fragments containing small peptides of interest as the immunizing 

20 antigen. The polypeptide or oligopeptide used to immunize an animal (e.g., a mouse, a rat, 
or a rabbit) can be derived from the translation of RNA, or synthesized chemically, and 
can be conjugated to a carrier protein if desired. Commonly used carriers that are 
chemically coupled to peptides include bovine serum albumin, thyroglobulin, and keyhole 
limpet hemocyanin (KLH). The coupled peptide is then used to immunize the animal. 

25 The term "antigenic determinant," as used herein, refers to that fragment of a 

molecule (i.e., an epitope) that makes contact with a particular antibody. When a protein 
or a fragment of a protein is used to immunize a host animal, numerous regions of the 
protein may induce the production of antibodies which bind specifically to antigenic 
determinants (given regions or three-dimensional structures on the protein). An antigenic 

30 determinant may compete with the intact antigen (i.e., the immunogen used to elicit the 
immune response) for binding to an antibody. 

The term * 4 antisense," as used herein, refers to any composition containing a 
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nucleic acid sequence which is complementary to a specific nucleic acid sequence. The 
term "antisense strand" is used in reference to a nucleic acid strand that is complementary 
to the "sense" strand. Antisense molecules may be produced by any method including 
synthesis or transcription. Once introduced into a cell, the complementary nucleotides 
5 combine with natural sequences produced by the cell to form duplexes and to block either 
transcription or translation. The designation "negative" can refer to the antisense strand, 
and the designation "positive" can refer to the sense strand. 

As used herein, the term "biologically active," refers to a protein having structural, 
regulatory, or biochemical functions of a naturally occurring molecule. Likewise, 

1 o "immunologically active" refers to the capability of the natural, recombinant, or synthetic 
SIGP, or of any oligopeptide thereof, to induce a specific immune response in appropriate 
animals or cells and to bind with specific antibodies. 

The terms "complementary" or "complementarity," as used herein, refer to the 
natural binding of polynucleotides under permissive salt and temperature conditions by 

1 5 base pairing. For example, the sequence "A-G-T" binds to the complementary sequence 
"T-C-A." Complementarity between two single-stranded molecules may be "partial," 
such that only some of the nucleic acids bind, or it may be "complete," such that total 
complementarity exists between the single stranded molecules. The degree of 
complementarity between nucleic acid strands has significant effects on the efficiency and 

20 strength of the hybridization between the nucleic acid strands. This is of particular 

importance in amplification reactions, which depend upon binding between nucleic acids 
strands, and in the design and use of peptide nucleic acid (PNA) molecules. 

A "composition comprising a given polynucleotide sequence" or a "composition 
comprising a given amino acid sequence," as these terms are used herein, refer broadly to 

25 any composition containing the given polynucleotide or amino acid sequence. The 
composition may comprise a dry formulation, an aqueous solution, or a sterile 
composition. Compositions comprising polynucleotides encoding SIGP, e.g., SEQ ID 
NO: 16, SEQ ID NO: 17. SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO:20, SEQ ID NO:2 1 , 
SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ 

30 ID NO:27, SEQ ID NO:28, SEQ ID NO:29, and SEQ ID 30, or fragments thereof, may be 
employed as hybridization probes. The probes may be stored in freeze-dned form and 
mav be associated with a stabilizing agent such as a carbohydrate. In hybridizations, the 
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probe may be deployed in an aqueous solution containing salts (e.g., NaCl), detergents 
(e.g., SDS) and other components (e.g., Denhardt's solution, dry milk, salmon sperm 
DNA, etc.). 

The phrase "consensus sequence," as used herein, refers to a nucleic acid sequence 
5 which has been resequenced to resolve uncalled bases, extended using XL-PCR™ (Perkin 
Elmer, Norwalk, CT) in the 5' and/or the 3' direction, and resequenced, or which has been 
assembled from the overlapping sequences of more than one Incyte Clone using a 
computer program for fragment assembly, such as the GELVIEW™ Fragment Assembly 
system (GCG, Madison, WI). Some sequences have been both extended and assembled to 
1 0 produce the consensus sequence . 

As used herein, the term "correlates with expression of a polynucleotide" indicates 
that the detection of the presence of nucleic acids, the same or related to a nucleic acid 
sequence encoding SIGP, by northern analysis is indicative of the presence of nucleic 
acids encoding SIGP in a sample, and thereby correlates with expression of the transcript 
1 5 from the polynucleotide encoding SIGP. 

The term "SIGP" refers to any or all of the human polypeptides, SIGP-1 , SIGP-2, 
SIGP-3, SIGP-4, SIGP-5, SIGP-6, SIGP-7, SIGP-8, SIGP-9, SIGP-10, SIGP-1 1, SIGP-12, 
SIGP- 13, SIGP- 14, and SIGP- 15. 

A "deletion," as the term is used herein, refers to a change in the amino acid or 
20 nucleotide sequence that results in the absence of one or more amino acid residues or 
nucleotides. 

The term "derivative," as used herein, refers to the chemical modification of SIGP, 
of a polynucleotide sequence encoding SIGP, or of a polynucleotide sequence 
complementary to a polynucleotide sequence encoding SIGP. Chemical modifications of a 

25 polynucleotide sequence can include, for example, replacement of hydrogen by an alkyl, 
acyl, or amino group. A derivative polynucleotide encodes a polypeptide which retains at 
least one biological or immunological function of the natural molecule. A derivative 
polypeptide is one modified by glycosylation, pegylation, or any similar process that 
retains at least one biological or immunological function of the polypeptide from which it 

30 was derived. 

The term "homology " as used herein, refers to a degree of complementarity. 
There may be partial homology or complete homology. The word "identity" may 
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substitute for the word "homology." A partially complementary sequence that at least 
partially inhibits an identical sequence from hybridizing to a target nucleic acid is referred 
to as "substantially homologous." The inhibition of hybridization of the completely 
complementary sequence to the target sequence may be examined using a hybridization 
5 assay (Southern or northern blot, solution hybridization, and the like) under conditions of 
reduced stringency. A substantially homologous sequence or hybridization probe will 
compete for and inhibit the binding of a completely homologous sequence to the target 
sequence under conditions of reduced stringency. This is not to say that conditions of 
reduced stringency are such that non-specific binding is permitted, as reduced stringency 

10 conditions require that the binding of two sequences to one another be a specific (i.e., a 
selective) interaction. The absence of non-specific binding may be tested by the use of a 
second target sequence which lacks even a partial degree of complementarity (e.g., less 
than about 30% homology or identity). In the absence of non-specific binding, the 
substantially homologous sequence or probe will not hybridize to the second non- 

1 5 complementary target sequence. 

The phrases "percent identity" or "% identity" refer to the percentage of sequence 
similarity found in a comparison of two or more amino acid or nucleic acid sequences. 
Percent identity can be determined electronically, e.g., by using the MegAlign program 
(Lasergene software package, DNASTAR, Inc., Madison WI). The MegAlign program 

20 can create alignments between two or more sequences according to different methods, e.g., 
the Clustal Method. (Higgins, D.G. and Sharp, P.M. (1988) Gene 73:237-244.) The 
Clustal algorithm groups sequences into clusters by examining the distances between all 
pairs. The clusters are aligned pairwise and then in groups. The percentage similarity 
between two amino acid sequences, e.g., sequence A and sequence B, is calculated by 

25 dividing the length of sequence A, minus the number of gap residues in sequence A, minus 
the number of gap residues in sequence B, into the sum of the residue matches between 
sequence A and sequence B, times one hundred. Gaps of low or of no homology between 
the two amino acid sequences are not included in determining percentage similarity. 
Percent identity between nucleic acid sequences can also be calculated by the Clustal 

30 Method, or by other methods known in the art. such as the Jotun Hein Method. (See, e.g.. 
Hein, J. (1990) Methods in Enzymology 183:626-645.) Identity between sequences can 
also be determined by other methods known in the art, e.g., by varying hybridization 
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conditions. 



"Human artificial chromosomes" (HACs), as described herein, are linear 
microchromosomes which may contain DNA sequences of about 6 kb to 10 Mb in size, 
and which contain all of the elements required for stable mitotic chromosome segregation 
5 and maintenance. (See, e.g., Harrington, J.J. et al. (1997) Nat Genet. 15:345-355.) 

The term "humanized antibody," as used herein, refers to antibody molecules in 
which the amino acid sequence in the non-antigen binding regions has been altered so that 
the antibody more closely resembles a human antibody, and still retains its original 
binding ability. 

l o "Hybridization," as the term is used herein, refers to any process by which a strand 

of nucleic acid binds with a complementary strand through base pairing. 

As used herein, the term "hybridization complex" as used herein, refers to a 
complex formed between two nucleic acid sequences by virtue of the formation of 
hydrogen bonds between complementary bases. A hybridization complex may be formed 
15 in solution (e.g., C 0 t or R^t analysis) or formed between one nucleic acid sequence present 
in solution and another nucleic acid sequence immobilized on a solid support (e.g., paper, 
membranes, filters, chips, pins or glass slides, or any other appropriate substrate to which 
cells or their nucleic acids have been fixed). 

The words "insertion" or "addition," as used herein, refer to changes in an amino 
20 acid or nucleotide sequence resulting in the addition of one or more amino acid residues or 
nucleotides, respectively, to the sequence found in the naturally occurring molecule. 

"Immune response" can refer to conditions associated with inflammation, trauma, 
immune disorders, or infectious or genetic disease, etc. These conditions can be 
characterized by expression of various factors, e.g., cytokines, chemokines, and other 
25 signaling molecules, which may affect cellular and systemic defense systems. 

The term "microarray," as used herein, refers to an array of distinct polynucleotides 
or oligonucleotides arrayed on a substrate, such as paper, nylon or any other type of 
membrane, filter, chip, glass slide, or any other suitable solid support. 

The term "modulate," as it appears herein, refers to a change in the activity of 
30 SIGP. For example, modulation may cause an increase or a decrease in protein activity, 
binding characteristics, or any other biological, functional, or immunological properties of 
SIGP. 
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The phrases fci nucleic acid" or "nucleic acid sequence," as used herein, refer to an 
oligonucleotide, nucleotide, polynucleotide, or any fragment thereof, to DNA or RNA of 
genomic or synthetic origin which may be single-stranded or double- stranded and may 
represent the sense or the antisense strand, to peptide nucleic acid (PNA), or to any DNA- 
5 like or RNA-like material. In this context, "fragments" refers to those nucleic acid 

sequences which are greater than about 60 nucleotides in length, and most preferably are at 
least about 100 nucleotides, at least about 1000 nucleotides, or at least about 10,000 
nucleotides in length. 

The terms "operably associated" or "operably linked," as used herein, refer to 
10 functionally related nucleic acid sequences. A promoter is operably associated or operably 
linked with a coding sequence if the promoter controls the transcription of the encoded 
polypeptide. While operably associated or operably linked nucleic acid sequences can be 
contiguous and in reading frame, certain genetic elements, e.g., repressor genes, are not 
contiguously linked to the encoded polypeptide but still bind to operator sequences that 
1 5 control expression of the polypeptide. 

The term "oligonucleotide," as used herein, refers to a nucleic acid sequence of at 
least about 6 nucleotides to 60 nucleotides, preferably about 15 to 30 nucleotides, and 
most preferably about 20 to 25 nucleotides, which can be used in PCR amplification or in 
a hybridization assay or microarray. As used herein, the term "oligonucleotide" is 
20 substantially equivalent to the terms "amplimers," "primers," "oligomers," and "probes," 
as these terms are commonly defined in the art. 

"Peptide nucleic acid" (PNA), as used herein, refers to an antisense molecule or 
anti-gene agent which comprises an oligonucleotide of at least about 5 nucleotides in 
length linked to a peptide backbone of amino acid residues ending in lysine. The terminal 
25 lysine confers solubility to the composition. PNAs preferentially bind complementary 
single stranded DNA and RNA and stop transcript elongation, and may be pegylated to 
extend their lifespan in the cell. (See, e.g., Nielsen, P.E. et al. (1993) Anticancer Drug 
Des. 8:53-63.) 

The term "sample " as used herein, is used in its broadest sense. A biological 
30 sample suspected of containing nucleic acids encoding SIGP, or fragments thereof or 

SIGP itself may comprise a bodily fluid; an extract from a cell chromosome, organelle, or 
membrane isolated from a cell; a cell; genomic DNA, RNA, or cDNA, in solution or 
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bound to a solid support; a tissue; a tissue print; etc. 

As used herein, the terms "specific binding" or "specifically binding" refer to that 
interaction between a protein or peptide and an agonist, an antibody, or an antagonist. The 
interaction is dependent upon the presence of a particular structure of the protein 
5 recognized by the binding molecule (i.e., the antigenic determinant or epitope). For 
example, if an antibody is specific for epitope "A," the presence of a polypeptide 
containing the epitope A, or the presence of free unlabeled A, in a reaction containing free 
labeled A and the antibody will reduce the amount of labeled A that binds to the antibody. 
As used herein, the term "stringent conditions" refers to conditions which permit 
l o hybridization between polynucleotide sequences and the claimed polynucleotide 
sequences. Suitably stringent conditions can be defined by, for example, the 
concentrations of salt or formamide in the prehybridization and hybridization solutions, or 
by the hybridization temperature, and are well known in the art. In particular, stringency 
can be increased by reducing the concentration of salt, increasing the concentration of 
1 5 formamide, or raising the hybridization temperature. 

For example, hybridization under high stringency conditions could occur in about 
50% formamide at about 37°C to 42°C. Hybridization could occur under reduced 
stringency conditions in about 35% to 25% formamide at about 30°C to 35°C. In 
particular, hybridization could occur under high stringency conditions at 42°C in 50% 
20 formamide, 5X SSPE, 0.3% SDS, and 200 ^g/ml sheared and denatured salmon sperm 
DNA. Hybridization could occur under reduced stringency conditions as described above, 
but in 35% formamide at a reduced temperature of 35°C. The temperature range 
corresponding to a particular level of stringency can be further narrowed by calculating the 
purine to pyrimidine ratio of the nucleic acid of interest and adjusting the temperature 
25 accordingly. Variations on the above ranges and conditions are well known in the art. 

The term "substantially purified," as used herein, refers to nucleic acid or amino 
acid sequences that are removed from their natural environment and are isolated or 
separated, and are at least about 60% free, preferably about 75% free, and most preferably 
about 90% free from other components with which they are naturally associated. 
30 A "substitution," as used herein, refers to the replacement of one or more amino 

acids or nucleotides by different amino acids or nucleotides, respectively. 

-Transformation." as defined herein, describes a process by which exogenous DNA 



# 
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enters and changes a recipient cell. Transformation may occur under natural or artificial 
conditions according to various methods well known in the an, and may rely on any 
known method for the insertion of foreign nucleic acid sequences into a prokaryotic or 
eukaryotic host cell. The method for transformation is selected based on the type of host 
5 cell being transformed and may include, but is not limited to, viral infection, 
electroporation, heat shock, lipofection, and particle bombardment. The term 
"transformed" cells includes stably transformed cells in which the inserted DNA is capable 
of replication either as an autonomously replicating plasmid or as part of the host 
chromosome, and refers to cells which transiently express the inserted DNA or RNA for 

10 limited periods of time. 

A "variant" of SIGP, as used herein, refers to an amino acid sequence that is 
altered by one or more amino acids. The variant may have "conservative" changes, 
wherein a substituted amino acid has similar structural or chemical properties (e.g., 
replacement of leucine with isoleucine). More rarely, a variant may have 

15 "nonconservative" changes (e.g., replacement of glycine with tryptophan). Analogous 
minor variations may also include amino acid deletions or insertions, or both. Guidance in 
determining which amino acid residues may be substituted, inserted, or deleted without 
abolishing biological or immunological activity may be found using computer programs 
well known in the art, for example, DNA STAR software. 

20 

THE INVENTION 

The invention is based on the discovery of new human signal peptide-containing 
proteins, collectively referred to as SIGP and individually as SIGP-1, SIGP-2, SIGP-3, 
SIGP-4, SIGP-5, SIGP-6, SIGP-7, SIGP-8, SIGP-9, SIGP-10, SIGP-1 1, SIGP- 12, 

25 SIGP- 13, SIGP- 14, and SIGP-1 5, the polynucleotides encoding SIGP (SEQ ID NO: 16, 
SEQ ID NO: 1 7, SEQ ID NO: 1 8, SEQ ID NO: 19, SEQ ID NO:20, SEQ ID NO:2 1 , SEQ 
ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID 
NO:27, SEQ ID NO:28, SEQ ID NO:29, and SEQ ID NO:30); and the use of these 
compositions for the diagnosis, treatment, or prevention of cancer and immunological 

30 disorders. Table 1 shows the sequence identification numbers, Incyte Clone identification 
number, cDNA library, NCBI sequence identifier and GenBank species description for 
each of the human signal peptide-containing proteins disclosed herein. 
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Nucleic acids encoding the SIGP-1 of the present invention were first identified in 
Incyte Clone 866885 from the brain tumor cDNA library (BRAITUT03) using a computer 
search for amino acid sequence alignments. A consensus sequence, SEQ ID NO: 16. was 
derived from Incyte Clones 866885 (BRAITUT03), 2991983 (KJDNFET02), 067954 
5 (HUVESTB01), and 1499109 (SINTBST01). 

In one embodiment, the invention encompasses a polypeptide comprising the 
amino acid sequence of SEQ IDNO:l. SIGP-1 is 236 amino acids in length and has a 
potential N-glycosylation site at N 199; two potential casein kinase II phosphorylation 
sites at S8 and T72; a potential N-myristoylation site at G169; and three potential protein 
.0 kinase C phosphorylation sites at T43, S96, and T201. SIGP-1 shares 24% identity with 
rat syntaxin (GI 1488683). The fragment of SEQ ID NO: 16 from about nucleotide 43 to 
about nucleotide 93 is useful for hybridization. Northern analysis shows the expression of 
this sequence in hematopoietic and immune, reproductive, gastrointestinal, neural, 
cardiovascular, and developmental cDNA libraries. Approximately 43% of these libraries 
1 5 are associated with neoplastic disorders, 26% with inflammation, and 1 9% with cell 
proliferation. 

Nucleic acids encoding the SIGP-2 of the present invention were first identified 
in Incyte Clone 1273453 from the testicle cDNA library (TESTTUT02) using a 
computer search for amino acid sequence alignments. A consensus sequence, SEQ ID 
20 NO: 17, was derived from Incyte Clones 1273453 (TESTTUT02), 1970337 

(UCMCL5T01), 1218926 (NEUTGMT01), 1881349 (LEUKNOT03), and 1722377 
(BLADNT06). 

In one embodiment, the invention encompasses a polypeptide comprising the 
amino acid sequence of SEQ ID NO:2. SIGP-17 is 267 amino acids in length and has a 

25 potential N glycosylation site at N230, five potential casein kinase II phosphorylation 
sites at S9, T45, T77, S190, and T263, and two potential protein kinase C 
phosphorylation sites at S232 and S236. The fragment of SEQ ID NO: 17 from about 
nucleotide 140 to about nucleotide 175 is useful for hybridization. Northern analysis 
shows the expression of this sequence in reproductive, cardiovascular, and 

30 hematopoietic and immune cDNA libraries. Approximately 42% of these libraries are 
associated with neoplastic disorders and 40% with immune response. 

Nucleic acids encoding the SIGP-3 of the present invention were first identified in 
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Incyte Clone 1 534876 from the spleen cDNA library (SPLNNOT04) using a computer 
search for amino acid alignments. A consensus sequence, SEQ ID NO: 1 8, was derived 
from Incyte Clones 1253004 (LUNGFET03), 1382838 (BRAITUT08), 1532501 
(SPLNNOT04), 1534876 (SPLNNOT04), 1705806 (DUODNOT02), 1738301 
5 (COLNNOT22), 1926209 (BRSTNOT02), and shotgun sequences SAOA00587, 
SAOA02048, and SAOA03535. 

In one embodiment, the invention encompasses a polypeptide comprising the 
amino acid sequence of SEQ ID NO:3. SIGP-3 is 161 amino acids in length and has a 
potential signal peptide sequence between Ml and CI 3. SIGP-3 also has 17 cysteine 

1 0 residues with the potential for forming intramolecular disulfide bridges. Six of these 
cysteine residues, between residues CI 29 and CI 52, are found in a signature sequence for 
trypsin/alpha-amylase inhibitors that form a structure with intramolecular disulfide 
bridges. SIGP-3 has two potential casein kinase II phosphorylation sites at T25 and S35; 
and two potential protein kinase C phosphorylation sites at S3 5 and T8 7. The fragment of 

1 5 SEQ ID NO: 1 8 from about nucleotide 406 to about nucleotide 477, which encompasses 
the trypsin/alpha-amylase inhibitor signature sequence, is useful for hybridization. 
Northern analysis shows the expression of this sequence in gastrointestinal and male and 
female reproductive cDNA libraries. Approximately 45% of these libraries are associated 
with neoplastic disorders and 28% with inflammation and the immune response.. 

20 Nucleic acids encoding the SIGP-4 of the present invention were first identified in 

Incyte Clone 1634813 from the cecal tissue cDNA library (COLNNOT19) using a 
computer search for amino acid sequence alignments. A consensus sequence, SEQ ID 
NO:19, was derived from Incyte Clones 1634813 (COLNNOT19), 2904583 
(THYMNOT05), 1634813 (COLNNOT19), and 1310492 (COLNFET02), and shotgun 

25 sequence SAPA04436. 

In one embodiment, the invention encompasses a polypeptide comprising the 
amino acid sequence of SEQ ID NO:4. SIGP-4 is 150 amino acids in length and has one 
potential N-glycosylation site at N139; and five potential phosphorylation sites at T48. 
S 1 1 8, S 1 26, S 1 35, and S 1 36. SIGP-4 also has a potential signal peptide sequence 

30 encompassing residues M1-A23. SIGP-4 shares 28% identity with mouse beta 
chemokine. Exodus-2 (GI 2196924). The fragment of SEQ ID NO: 19 from about 
nucleotide 175 to about nucleotide 235 is useful for hybridization. Northern analysis 
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shows the expression of this sequence in gastrointestinal, developmental, hematopoietic, 
and immunological cDNA libraries. Approximately 50% of these libraries are associated 
with fetal development/cell proliferation and 25% with immune response. 

Nucleic acids encoding the SIGP-5 of the present invention were first identified 
5 in Incyte Clone 171 1840 from the prostate cDNA library (PROSNOT16) using a 
computer search for amino acid sequence alignments. A consensus sequence, SEQ ID 
NO:20, was derived from Incyte Clones 1711840 (PROSNOT16) and 2550483 
(LUNGTUT06) and shotgun sequence SAQA03185. 

In one embodiment, the invention encompasses a polypeptide comprising the 

10 amino acid sequence of SEQ ID NO:5. SIGP-5 is 1 18 amino acids in length and has 
three potential protein kinase C phosphorylation sites at S48, T103, and S109; and a 
potential signal peptide sequence from Ml to A20. SIGP-5 shares 61% identity with 
human midkine, a retinoic acid-responsive heparin binding factor involved in regulation 
of growth and differentiation (GI 182651). The fragment of SEQ ID NO:20 from about 

15 nucleotide 51 1 to about nucleotide 555 is useful for hybridization. Northern analysis 
shows the expression of this sequence in reproductive, gastrointestinal, developmental, 
neural, and cardiovascular cDNA libraries. Approximately 58% of these libraries are 
associated with cancer, 16% with immune response, and 23% with fetal/proliferating 
cells. 

20 Nucleic acids encoding the SIGP-6 of the present invention were first identified 

in Incyte Clone 1747327 from the stomach tumor cDNA library (STOMTUT02) using a 
computer search for amino acid sequence alignments. A consensus sequence, SEQ ID 
NO:21, was derived from Incyte Clones 475228 (MMLR2DT01), 1500771 
(SINTBST01), 1880656 (LEUKNOT03), 1747327 (STOMTUT02), and 2720285 

25 (LUNGTUT10). 

In one embodiment, the invention encompasses a polypeptide comprising the 
amino acid sequence of SEQ ID NO:6. SIGP-6 is 248 amino acids in length and has 
one potential N-glycosylation site at N56; three potential casein kinase II 
phosphorylation sites at S46, S134, and S140; and one potential protein kinase C 

30 phosphorylation site at T217. SIGP-6 shares 100% identity with human K12 protein 
precursor which is expressed in breast cancer cells and peripheral blood leukocytes (GI 
2062391). Northern analysis shows the expression of this sequence in gastrointestinal. 
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reproductive, heraatopoietic/immune, and cardiovascular cDNA libraries. 
Approximately 59% of these libraries are associated with cancer and 35% with immune 
response. 

Nucleic acids encoding the SIGP-7 of the present invention were first identified 

5 in Incyte Clone 1864292 from the diseased prostate cDNA library (PROSNOT19) using 
a computer search for amino acid sequence alignments. A consensus sequence, SEQ ID 
NO:22, was derived from Incyte Clone 1864292 (PROSNOT19) and shotgun sequences 
SARA02195, SARA03070, SARA03675, and SATA02454. 

In one embodiment, the invention encompasses a polypeptide comprising the 

,0 amino acid sequence of SEQ ID NO:7. SIGP-7 is 404 amino acids in length and has 
one potential amidation site at V136; one potential cAMP- and cGMP-dependent protein 
kinase phosphorylation site at S66; twenty potential casein kinase II phosphorylation sites 
at S23, T27, T74, SI 10, SI 1 1, SI 18, T122, S143, S145, S205, S207, S218, S219, S220, 
T252, S254, S328, S330, S385, and T393; and twelve potential protein kinase C 

15 phosphorylation sites at T27, S76, T81, S140, S161, S176, S229, T285, S309, S356, S367, 
and S398. SIGP-7 shares 18% identity with the S. cergyisiae protein encoded by 
SRP40, a weak suppressor of a mutant of the subunit AC40 of DNA-dependent RNA 
polymerases 1 and II (GI 295671). The fragment of SEQ ID NO:22 f rom about 
nucleotide 193 to about nucleotide 222 is useful for hybridization. Northern analysis 

20 shows the expression of this sequence in reproductive, cardiovascular , and 

hematopoietic/immune cDNA libraries. Approximately 75% of these libraries are 
associated with cancer and 25% with immune response. 

Nucleic acids encoding the SIGP-8 of the present invention were first identified 
in Incyte Clone 1866437 from the human promonocyte cell line cDNA library 

25 (THP1NOT01) using a computer search for amino acid sequence alignments. A 
consensus sequence, SEQ ID NO:23, was derived from Incyte Clones 817970 
(OVARTUT01), 825684 (PROSNOT06), 1866437 (THP1NOT01), 2190170 
(PROSNOT26), and 3137972 (SMCCNOT02). 

In one embodiment, the invention encompasses a polypeptide comprising the 

30 amino acid sequence of SEQ ID NO:8. SIGP-8 is 405 amino acids in length and has 
one potential N-glycosylation site at N378; one potential cAMP- and cGMP- 
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phosphorylation site at S332; nine potential casein kinase II phosphorylation sites at 
T34, S51, T77. S107, S158, S264, T266, S296, and S332; and one potential protein 
kinase C phosphorylation site at S68. The fragment of SEQ ID NO:23 from about 
nucleotide 85 to about nucleotide 144 is useful for hybridization. Northern analysis 
shows the expression of this sequence in reproductive, hematopoietic/immune, neural, 
and developmental cDNA libraries. Approximately 37% of these libraries are 
associated with cancer, 33% with immune response, and 22% with fetal/proliferating 
cells. 

Nucleic acids encoding the SIGP-9 of the present invention were first identified 
in Incyte Clone 1871375 from the leg skin erythema nodosum cDNA library 
(SKINBIT01) using a computer search for amino acid sequence alignments. A 
consensus sequence, SEQ ID NO:24, was derived from Incyte Clones 1428052 
(SINTBST01), 1871375 (SKINBIT01), and 3210563 (BLADNOT08). 

In one embodiment, the invention encompasses a polypeptide comprising the 
15 amino acid sequence of SEQ ID NO:9. SIGP-9 is 177 amino acids in length and has 
one potential casein kinase II phosphorylation site at SI 33; one potential 
glycosaminoglycan attachment site at S28GGG; and four potential protein kinase C 
phosphorylation sites at S44, S82, SI 15, and T148. SIGP-9 contains a signature 
sequence shared by the binding domains of receptors for lymphokines, hematopoietic 
20 growth factors and growth hormone-related molecules at S52RWSLWS. The fragment 
of SEQ ID NO:24 encoding the sequence surrounding the receptor binding domain 
signature from about nucleotide 190 to about nucleotide 249 is useful for hybridization. 
Northern analysis shows the expression of this sequence in reproductive, cardiovascular, 
gastrointestinal, and developmental cDNA libraries. Approximately 44% of these 
25 libraries are associated with cancer and 19% with immune response. 

Nucleic acids encoding the SIGP-10 of the present invention were first identified 
in Incyte Clone 1880830 from the leukocyte cDNA library (LEUKNOT03) using a 
computer search for amino acid sequence alignments. A consensus sequence, SEQ ID 
NO:25, was derived from Incyte Clones 361577 (PROSNOT01); 2113591 
30 (BRAITUT03): 1880830 (LEUKNOT03) and shotgun sequences SATA03292 and 
SATA00377. 

In one embodiment, the invention encompasses a polypeptide comprismg the 
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amino acid sequence of SEQ ID NO: 10. SIGP-10 is 197 amino acids in length and has 
a potential cAMP- and cGMP-dependent protein kinase phosphorylation site at SI 2 1 ; and 
four potential protein kinase C phosphorylation sites at T3, S57, T107, and T153. SIGP- 
40 shares 15% identity with the Arahidopsis thaliana zinc-finger protein Lsdl (GI 
5 1872521). The fragment of SEQ ID NO:25 from about nucleotide 567 to about 

nucleotide 621 is useful for hybridization. Northern analysis shows the expression of 
this sequence in neural and reproductive cDNA libraries. Approximately 49% of these 
libraries are associated with neoplastic disorders, 24% with immune response, and 16% 
with fetal development. 

10 Nucleic acids encoding the SIGP-1 1 of the present invention were first identified 

in Incyte Clone 2328134 from the colon cDNA library (COLNNOT11) using a 
computer search for amino acid sequence alignments. A consensus sequence, SEQ ID 
NO:26, was derived from Incyte Clones 2328134 (COLNNOT1 1), 1870180 
(SKINBIT01), 081403 (SYNORAB01), and 851547 (NGANNOT01). 

15 In one embodiment, the invention encompasses a polypeptide comprising the 

amino acid sequence of SEQ ID NO: 1 1 . SIGP-1 1 is 346 amino acids in length and has 
two potential cAMP- and cGMP-dependent protein kinase phosphorylation sites at 
residues S43 and S217; one potential casein kinase II phosphorylation site at residue 
T96; and five potential protein kinase C phosphorylation sites at residues T2, T15, T39, 

20 T247, and S301 . SIGP-50 shares 33 % identity with the human putative rab5-interacting 
protein (GI 191 1776) and the casein kinase II phosphorylation site at residue T96. The 
fragment of SEQ ID NO:26 encoding the potential extracellular ligand binding domain 
from about nucleotide 16 to about nucleotide 76 is useful for hybridization. Northern 
analysis shows the expression of this sequence in reproductive, gastrointestinal, 

25 cardiovascular, and neural cDNA libraries. Approximately 44% of these libraries are 
associated with cancer, 28% are associated with immune response, and 20% with fetal 
disorders. 

Nucleic acids encoding the SIGP-1 2 of the present invention were first identified in 
Ir.cyte Clone 2652271 from the thymus cDNA library (THYMNOT04) using a computer 
30 search for amino acid sequence alignments. A consensus sequence. SEQ ID NO:27. was 
derived from Incyte Clones 2652271 (THYMNOT04), 27428 1 3 (BRSTTUT14). 763431 
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(BRAITUT02), 1272403 (TESTTUT02), 1240531 (LUNGNOT03), and 131S448 
(BLADNOT04). 

In one embodiment, the invention encompasses a polypeptide comprising the 
amino acid sequence of SEQ ID NO: 12. SIGP-12 is 256 amino acids in length and has 
5 three potential N glycosylation sites at N76, N106, and N212; three potential casein kinase 
II phosphorylation sites at T46, SI 88, and T204; two potential protein kinase C 
phosphorylation sites at SI 30 and S221; two potential ribonuclease T2 family histidine 
active sites from W62 to P69 and from Fl 10 to CI 21; and a potential signal peptide 
sequence from Ml to A24. SIGP-59 shares 24% identity with Solanum lycopersicum 

!0 ribonuclease LE (GI 895855); 80% identity between W62 and P75, one of the two 

ribonuclease T2 family histidine active sites; and 92% identity between Fl 10 and CI 21, 
the second of the two ribonuclease T2 family histidine active sites. The fragment of SEQ 
ID NO:27 from about nucleotide 462 to about nucleotide 494 is useful for hybridization. 
Northern analysis shows the expression of this sequence in reproductive, hematopoietic, 

15 and gastrointestinal cDNA libraries. Approximately 53% of these libraries are associated 
with neoplastic disorders and 28% with immune response. 

Nucleic acids encoding the SIGP-13 of the present invention were first identified 
in Incyte Clone 2965248 from the cervical spinal cord cDNA library (SCORNOT04) 
using a computer search for amino acid sequence alignments. A consensus sequence, 

20 SEQ ID NO:28, was derived from Incyte Clones 2965248 (SCORNOT04), 485746 
(HNT2RAT01), 865684 (BRAITUT03), 1459157 (COLNFET02), 1597772 
(BRAINOT14), 531430 (BRAINOT03), 725362 (SYNOOAT01), 1620429 
(BRAITUT13), and 190305 (SYNORAB01). 

In one embodiment, the invention encompasses a polypeptide comprising the 

25 amino acid sequence of SEQ ID NO: 13. SIGP-13 is 235 amino acids in length and has 
seven potential cAMP- and cGMP-dependent protein kinase phosphorylation sites at 
S50, T80. T98, T126, S135, S136, and T194; three potential casein kinase II 
phosphorylation sites at S60, T80, and S81; six potential protein kinase C 
phosphorylation sites at SI 14. T119. T137, S142, S146, and S174; and a strathmin 1 

30 family signature from P75 to E84. SIGP-28 shares 44% identity with human strathmin 
homolog SCGlOmeuron-specific growth-associated protein in Alzheimer's disease (GI 
1478503), and 71% identity between Ml and A107. In addition, one potential cAMP- 
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and cGMP-dependent protein kinase phosphorylation site, one potential casein kinase II 
phosphorylation site, the strathmin 1 family signature, and the hydrophobic 
transmembrane domains are conserved between these molecules. TM1 extends from 
about L15 to about F25; and TM2, from about G196 to about P212. The fragments of 
5 SEQ ID NO:28 from about nucleotide 158 to about nucleotide 196 and from about 
nucleotide 614 to about nucleotide 643 are useful for hybridization. Northern analysis 
shows the expression of this sequence in neural, reproductive, gastrointestinal, and 
hematopoietic/immune cDNA libraries. Approximately 50% of these libraries are 
associated with neoplastic disorders and 19% with immune response. 

10 Nucleic acids encoding the SIGP-14 of the present invention were first identified 

in Incyte Clone 3057669 from the pons cDNA library (PONSAZT01) using a computer 
search for amino acid sequence alignments. A consensus sequence, SEQ ID NO:29, 
was derived from Incyte Clones 3057669 (PONSAZT01), 548211 (BEPINOT01), 
3702516 (PENCNOT07), 3581270 (293TF3T01), 495191 (HNT2NOT01), 2784427 

15 (BRSTNOT13), 1515961 (PANCTUT01), 3552333 (SYNONOT01), 2838668 
(DRGLNOT01), 14600680 (COLNFET02), and 285677 (EOSIHET02). 

In one embodiment, the invention encompasses a polypeptide comprising the 
amino acid sequence of SEQ ID NO: 14. SIGP-14 is 371 amino acids in length and has 
three potential N-glycosylation sites at N70, N125, and N362; eleven potential casein 

20 kinase II phosphorylation sites at T22, S66, S72, S73, S102, T160, T201 , T215, T278, 
T285, and S316; seven potential protein kinase C phosphorylation sites at S72, T79, 
S99, T 127, SI 34, S257, and T299; and one protein kinase signature and profile from 
LI 88 to F200. Northern analysis shows the expression of this sequence in 
gastrointestinal, reproductive, and neural cDNA libraries. Approximately 54% of these 

25 libraries are associated with neoplastic disorders and 14% with immune response. 

Nucleic acids encoding the SIGP-15 of the present invention were first identified in 
Incyte Clone 3125156 from the lymph node cDNA library (LNODNOT05) using a 
computer search for ammo acid sequence alignments. A consensus sequence, SEQ ID 
NO:30, was derived from Incyte Clones 3125156 (LNODNOT05), 1417459 

30 (BRAINOT12), 1567861 (UTRSNOT05), 154233 (THP1PLB02). 872652 
(LUNGAST01), 2525803 (BRAITUT21 ). and 1209172 (BRSTNOT02). 

In one embodiment, the invention encompasses a polypeptide comprising the 
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amino acid sequence of SEQ ID NO: 15. SIGP-15 is 523 amino acids in length and has 
one potential N glycosylation sites at N186; nine potential casein kinase II 
phosphorylation sites at S63, T85, SI 79, SI 88, T210, S231, T269, T295, and S474; one 
potential glycosaminoglycan attachment site at S335; ten potential protein kinase C 
5 phosphorylation sites at T9, S159, S 172, S179, T246, S263, S283, S416, S447, and S498; 
two potential tyrosine kinase phosphorylation sites at Y 106 and Y 170; and one tyrosine 
specific protein phosphatase active site at V33 1 . SIGP-30 shares 2 1 % identity with human 
T-cell protein tyrosine phosphatase (GI 804750), the N186 glycosylation site, the 
phosphorylation sites at S179, S188, T210, T246, S263, T295, S416, and Y170; and 50% 

l o identity between P324 and F344, the region of the tyrosine specific protein phosphatase 
active site. The fragments of SEQ ID NO:30 from about nucleotide 64 to about nucleotide 
183 and from about nucleotide 1087 to about nucleotide 1 1 19 are useful for hybridization. 
Northern analysis shows the expression of this sequence in neural, reproductive, and 
gastrointestinal cDNA libraries. Approximately 55% of these libraries are associated with 

1 5 neoplastic disorders and 22% with immune response. 

The invention also encompasses SIGP variants. A preferred SIGP variant is one 
which has at least about 80%, more preferably at least about 90%, and most preferably at 
least about 95% amino acid sequence identity to the SIGP amino acid sequence, and which 
contains at least one functional or structural characteristic of SIGP. 

20 The invention also encompasses polynucleotides which encode SIGP. Accordingly, 
any nucleic acid sequence which encodes the amino acid sequence of SIGP can be used 
to produce recombinant molecules which express SIGP. In a particular embodiment, the 
invention encompasses a polynucleotide consisting of a nucleic acid sequence selected 
from the group consisting of SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID 

25 NO:19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, 
SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID N0.28, SEQ ID NO:29, and 
SEQ ID NO:30. 

It will be appreciated by those skilled in the art that as a result of the degeneracy of 
the genetic code, a multuude of polynucleotide sequences encoding SIGP. some bearing 
30 minimal homology to the polynucleotide sequences of any known and naturally occurring 
2 ene. may be produced. Thus, the invention contemplates each and every possible 
variation of polynucleotide sequence that could be made by selecting combinations based 
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on possible codon choices. These combinations are made in accordance with the standard 
triplet genetic code as applied to the polynucleotide sequence of naturally occurring SIGP, 
and all such variations are to be considered as being specifically disclosed. 

Although nucleotide sequences which encode SIGP and its variants are preferably 
5 capable of hybridizing to the nucleotide sequence of the naturally occurring SIGP under 
appropriately selected conditions of stringency, it may be advantageous to produce 
nucleotide sequences encoding SIGP or its derivatives possessing a substantially different 
codon usage. Codons may be selected to increase the rate at which expression of the 
peptide occurs in a particular prokaryotic or eukaryotic host in accordance with the 

1 o frequency with which particular codons are utilized by the host. Other reasons for 

substantially altering the nucleotide sequence encoding SIGP and its derivatives without 
altering the encoded amino acid sequences include the production of RNA transcripts 
having more desirable properties, such as a greater half-life, than transcripts produced 
from the naturally occurring sequence. 

1 5 The invention also encompasses production of DN A sequences which encode SIGP 

and SIGP derivatives, or fragments thereof, entirely by synthetic chemistry. After 
production, the synthetic sequence may be inserted into any of the many available 
expression vectors and cell systems using reagents that are well known in the art. 
Moreover, synthetic chemistry may be used to introduce mutations into a sequence 

20 encoding SIGP or any fragment thereof. 

Also encompassed by the invention are polynucleotide sequences that are capable of 
hybridizing to the claimed polynucleotide sequences, and, in particular, to those shown in 
SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO: 18, SEQ ID NO: 19, SEQ ID NO:20, SEQ 
ID NO:2 1 , SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ ID NO:25, SEQ ID 

25 NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, and SEQ ID NO:30, under 

various conditions of stringency. (See, e.g., Wahl, GM. and S.L. Berger (1987) Methods 
Enzymol. 152:399-407; and Kimmel, A.R. (1987) Methods Enzymol. 152:507-511.) 

Methods for DNA sequencing are well known and generally available in the an and 
may be used to practice any of the embodiments of the invention. The methods may 

30 employ such enzymes as the Klenow fragment of DNA polymerase I, Sequenase© (US 
Biochemical Corp.. Cleveland. OH), Taq polymerase (Perkin Elmer), thermostable T7 
polymerase (Amersham, Chicago. 1L), or combinations of polymerases and proofreading 
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exonucleases such as those found in the ELONGASE Amplification System (GlBCO/BRJL, 
Gaithersburg, MD). Preferably, the process is automated with machines such as the 
Hamilton Micro Lab 2200 (Hamilton, Reno, NV), Peltier Thermal Cycler (PTC200; MJ 
Research, Watertown, MA) and the ABI Catalyst and 373 and 377 DNA Sequencers 
5 (Perkin Elmer). 

The nucleic acid sequences encoding SIGP may be extended utilizing a partial 
nucleotide sequence and employing various methods known in the art to detect upstream 
sequences, such as promoters and regulatory elements. For example, one method which 
may be employed, restriction-site PCR, uses universal primers to retrieve unknown 

10 sequence adjacent to a known locus. (See, e.g., Sarkar, G. (1993) PCR Methods Applic. 
2:318-322.) In particular, genomic DNA is first amplified in the presence of a primer 
complementary to a linker sequence within the vector and a primer specific to the region 
predicted to encode the gene. The amplified sequences are then subjected to a second 
round of PCR with the same linker primer and another specific primer internal to the first 

15 one. Products of each round of PCR are transcribed with an appropriate RNA polymerase 
and sequenced using reverse transcriptase. 

Inverse PCR may also be used to amplify or extend sequences using divergent 
primers based on a known region. (See, e.g., Triglia, T. et al. (1988) Nucleic Acids Res. 
16:8186.) The primers may be designed using commercially available software such as 

20 OLIGO 4.06 Primer Analysis software (National Biosciences Inc., Plymouth, MN) or 
another appropriate program to be about 22 to 30 nucleotides in length, to have a GC 
content of about 50% or more, and to anneal to the target sequence at temperatures of 
about 68°C to 72°C. The method uses several restriction enzymes to generate a suitable 
fragment in the known region of a gene. The fragment is then circularized by 

25 intramolecular ligation and used as a PCR template. 

Another method which may be used is capture PCR, which involves PCR 
amplification of DNA fragments adjacent to a known sequence in human and yeast 
artificial chromosome DNA. (See, e.g., Lagerstrom, M. et al. (1991) PCR Methods 
Applic. 1:111-119.) In this method, multiple restriction enzyme digestions and ligations 

30 may be used to place an engineered double-stranded sequence into an unknown fragment 
of the DNA molecule before performing PCR. Other methods which may be used to 
retrieve unknown sequences are known in the art. (See. e.g., Parker, J.D. et al. (1991 ) 
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Nucleic Acids Res. 19:3055-3060.) Additionally, one may use PCR, nested primers, and 
PromoterFinder™ libraries to walk genomic DNA (Clontech, Palo Alto, CA). This 
process avoids the need to screen libraries and is useful in finding intron/exon junctions. 
When screening for full-length cDNAs, it is preferable to use libraries that have been 
5 size-selected to include larger cDNAs. Also, random-primed libraries are preferable in 
that they will include more sequences which contain the 5' regions of genes. Use of a 
randomly primed library may be especially preferable for situations in which an oligo d(T) 
library does not yield a full-length cDNA. Genomic libraries may be useful for extension 
of sequence into 5' non-transcribed regulatory regions. 

10 Capillary electrophoresis systems which are commercially available may be used to 

analyze the size or confirm the nucleotide sequence of sequencing or PCR products. In 
particular, capillary sequencing may employ flowable polymers for electrophoretic 
separation, four different fluorescent dyes (one for each nucleotide) which are laser 
activated, and a charge coupled device camera for detection of the emitted wavelengths. 

1 5 Output/light intensity may be converted to electrical signal using appropriate software 
(e.g., Genotyper™ and Sequence Navigator™, Perkin Elmer), and the entire process from 
loading of samples to computer analysis and electronic data display may be computer 
controlled. Capillary electrophoresis is especially preferable for the sequencing of small 
pieces of DNA which might be present in limited amounts in a particular sample. 

20 In another embodiment of the invention, polynucleotide sequences or fragments 

thereof which encode SIGP may be used in recombinant DNA molecules to direct 
expression of SIGP, or fragments or functional equivalents thereof, in appropriate host 
cells. Due to the inherent degeneracy of the genetic code, other DNA sequences which 
encode substantially the same or a functionally equivalent amino acid sequence may be 

25 produced, and these sequences may be used to clone and express SIGP. 

As will be understood by those of skill in the art. it may be advantageous to produce 
SIGP-encoding nucleotide sequences possessing non-naturally occurring codons. For 
example, codons preferred by a particular prokaryotic or eukaryotic host can be selected to 
increase the rate of protein expression or to produce an RNA transcript having desirable 

30 properties, such as a half-life which is longer than that of a transcript generated from the 
naturally occurring sequence. 

The nucleotide sequences of the present invention can be engineered using methods 
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generally known in the art in order to alter SIGP-encoding sequences for a variety of 
reasons including, but not limited to, alterations which modify the cloning, processing, 
and/or expression of the gene product. DNA shuffling by random fragmentation and PCR 
reassembly of gene fragments and synthetic oligonucleotides may be used to engineer the 
5 nucleotide sequences. For example, site-directed mutagenesis may be used to insert new 
restriction sites, alter glycosylation patterns, change codon preference, produce splice 
variants, introduce mutations, and so forth. 

In another embodiment of the invention, natural, modified, or recombinant nucleic 
acid sequences encoding SIGP may be ligated to a heterologous sequence to encode a 

10 fusion protein. For example, to screen peptide libraries for inhibitors of SIGP activity, it 
may be useful to encode a chimeric SIGP protein that can be recognized by a 
commercially available antibody. A fusion protein may also be engineered to contain a 
cleavage site located between the SIGP encoding sequence and the heterologous protein 
sequence, so that SIGP may be cleaved and purified away from the heterologous moiety. 

15 In another embodiment, sequences encoding SIGP may be synthesized, in whole or in 
part, using chemical methods well known in the art. (See, e.g., Caruthers, M.H. et al. 
(1980) Nucl. Acids Res. Symp. Ser. 215-223, and Horn, T. et al. (1980) Nucl. Acids Res. 
Symp. Ser. 225-232.) Alternatively, the protein itself may be produced using chemical 
methods to synthesize the amino acid sequence of SIGP, or a fragment thereof. For 

20 example, peptide synthesis can be performed using various solid-phase techniques. (See, 
e.g., Roberge, J.Y. et al. (1995) Science 269:202-204.) Automated synthesis may be 
achieved using the ABI 43 1 A Peptide Synthesizer (Perkin Elmer). 

The newly synthesized peptide may be substantially purified by preparative high 
performance liquid chromatography. (See, e.g, Chiez, R.M. and F.Z. Regnier (1990) 

25 Methods Enzymol. 182:392-421.) The composition of the synthetic peptides may be 
confirmed by amino acid analysis or by sequencing. (See, e.g., Creighton, T. (1983) 
Proteins. Structures and Mod ular Properties. WH Freeman and Co., New York, NY.) 
Additionally, the amino acid sequence of SIGP, or any part thereof, may be altered during 
direct synthesis and/or combined with sequences from other proteins, or any part thereof. 

30 to produce a variant polypeptide. 

In order to express a biologically active SIGP, the nucleotide sequences encoding 
SIGP or derivatives thereof may be inserted into appropriate expression vector, i.e.. a 
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vector which contains the necessary elements for the transcription and translation of the 
inserted coding sequence. 

Methods which are well known to those skilled in the art may be used to construct 
expression vectors containing sequences encoding SIGP and appropriate transcriptional 
5 and translational control elements. These methods include in vitro recombinant DNA 
techniques, synthetic techniques, and in vivo genetic recombination. (See, e.g., Sambrook, 
J. et al. (1989) Molecular Clo ninp. A Laboratory Manual, Cold Spring Harbor Press, 
Plainview, NY, ch. 4, 8, and 16-17; and Ausubel, P.M. et al. (1995, and periodic 
supplements) QmsnS, Protoco l* jp Makenlai Biologv. John Wiley & Sons, New York, 

10 NY, ch. 9, 13, and 16.) 

A variety of expression vector/host systems may be utilized to contain and express 
sequences encoding SIGP. These include, but are not limited to, microorganisms such as 
bacteria transformed with recombinant bacteriophage, plasmid, or cosmid DNA expression 
vectors; yeast transformed with yeast expression vectors; insect cell systems infected with 

15 virus expression vectors (e.g., baculovirus); plant cell systems transformed with virus 
expression vectors (e.g., cauliflower mosaic virus (CaM V) or tobacco mosaic virus 
(TMV)) or with bacterial expression vectors (e.g., Ti or pBR322 plasmids); or animal cell 
systems. 

The invention is not limited by the host cell employed. 

20 The "control elements" or "regulatory sequences" are those non-translated regions, 
e.g., enhancers, promoters, and 5' and 3' untranslated regions, of the vector and 
polynucleotide sequences encoding SIGP which interact with host cellular proteins to 
carry out transcription and translation. Such elements may vary in their strength and 
specificity. Depending on the vector system and host utilized, any number of suitable 

25 transcription and translation elements, including constitutive and inducible promoters, may 
be used. For example, when cloning in bacterial systems, inducible promoters, e.g., hybrid 
lacZ promoter of the Bluescript® phagemid (Stratagene, La Jolla, CA) or pSportl™ 
plasmid (GIBCO/BRL), may be used. The baculovirus polyhedrin promoter may be used 
in insect cells. Promoters or enhancers derived from the genomes of plant cells (e.g., heat 

30 shock, RUBISCO, and storage protein genes) or from plant viruses (e.g., viral promoters 
or leader sequences) may be cloned into the vector. In mammalian cell systems, 
oromoters from mammal.an genes or from mammalian viruses are preferable. If it is 
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necessary to generate a cell line that contains multiple copies of the sequence encoding 
SIGP, vectors based on SV40 or EBV may be used with an appropriate selectable marker. 

In bacterial systems, a number of expression vectors may be selected depending upon 
the use intended for SIGP. For example, when large quantities of SIGP are needed for the 

5 induction of antibodies, vectors which direct high level expression of fusion proteins that 
are readily purified may be used. Such vectors include, but are not limited to, 
multifunctional F. coli cloning and expression vectors such as Bluescript® (Stratagene), 
in which the sequence encoding SIGP may be ligated into the vector in frame with 
sequences for the amino-terminal Met and the subsequent 7 residues of B-galactosidase so 

10 that a hybrid protein is produced, and pIN vectors. (See, e.g., Van Heeke, G. and S.M. 
Schuster (1989) J. Biol. Chem. 264:5503-5509.) pGEX vectors (Pharmacia Biotech, 
Uppsala, Sweden) may also be used to express foreign polypeptides as fusion proteins 
with glutathione S-transferase (GST). In general, such fusion proteins are soluble and can 
easily be purified from lysed cells by adsorption to glutathione-agarose beads followed by 

1 5 elution in the presence of free glutathione. Proteins made in such systems may be 
designed to include heparin, thrombin, or factor XA protease cleavage sites so that the 
cloned polypeptide of interest can be released from the GST moiety at will. 

In the yeast ^rrtwnmvce.s cerevisiae . a number of vectors containing constitutive or 
inducible promoters, such as alpha factor, alcohol oxidase, and PGH, may be used. (See, 

20 e.g., Ausubel, supra; and Grant et al. (1987) Methods Enzymol. 153:516-544.) 

In cases where plant expression vectors are used, the expression of sequences 
encoding SIGP may be driven by any of a number of promoters. For example, vira! 
promoters such as the 35S and 19S promoters of CaMV may be used alone or in 
combination with the omega leader sequence from TMV. (Takamatsu, N. (1987) EMBO 

25 J. 6:307-3 1 1 .) Alternatively, plant promoters such as the small subunit of RUBISCO or 
heat shock promoters may be used. (See, e.g., Coruzzi, G. et al. (1984) EMBO J. 
3:1671-1680; Broglie, R. et al. (1984) Science 224:838-843; and Winter, J. et al. (1991) 
Results Probl. Cell Differ. 17:85-105.) These constructs can be introduced into plant cells 
by d.rect DNA transformation or pathogen-mediated transfection. Such techniques are 

30 described in a number of generally available reviews (See, e.g., Hobbs. S. or Murry. L.E. 
in MgGrjw Hill Yearb ook QlScjence *nd Technoloev (1992) McGraw Hill, New York, 
NY; pp. 191-196.) 
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An insect system may also be used to express SIGP. For example, in one such 
syst em, Anagra phs ral.fomica nuclear polyhedro S1 s virus (AcNPV) is used as a vector to 
express foreign genes in SpnHnptpra frupinerda cells or in Trichoplusia larvae. The 
sequences encoding SIGP may be cloned into a non-essential region of the virus, such as 
5 the polyhedrin gene, and placed under control of the polyhedrin promoter. Successful 
insertion of sequences encoding SIGP will render the polyhedrin gene inactive and 
produce recombinant virus lacking coat protein. The recombinant viruses may then be 
used to infect, for example, S, frugiperda cells or Trichoplusia larvae in which SIGP may 
be expressed. (See, e.g., Engelhard, E.K. et al. (1994) Proc. Nat. Acad. Sci. 
10 91:3224-3227.) 

In mammalian host cells, a number of viral-based expression systems may be utilized. 
In cases where an adenovirus is used as an expression vector, sequences encoding SIGP 
may be ligated into an adenovirus transcription/translation complex consisting of the late 
promoter and tripartite leader sequence. Insertion in a non-essential El or E3 region of the 

1 5 viral genome may be used to obtain a viable virus which is capable of expressing SIGP in 
infected host cells. (See, e.g., Logan, J. and T. Shenk (1984) Proc. Natl. Acad. Sci. 
81 :3655-3659.) In addition, transcription enhancers, such as the Rous sarcoma virus 
(RSV) enhancer, may be used to increase expression in mammalian host cells. 

Human artificial chromosomes (HACs) may also be employed to deliver larger 

20 fragments of DNA than can be contained and expressed in a plasmid. HACs of about 6 kb 
to 1 0 Mb are constructed and delivered via conventional delivery methods (liposomes, 
polycationic amino polymers, or vesicles) for therapeutic purposes. 

Specific initiation signals may also be used to achieve more efficient translation of 
sequences encoding SIGP. Such signals include the ATG initiation codon and adjacent 

25 sequences. In cases where sequences encoding SIGP and its initiation codon and upstream 
sequences are inserted into the appropriate expression vector, no additional transcriptional 
or translational control signals may be needed. However, in cases where only coding 
sequence, or a fragment thereof, is inserted, exogenous translational control signals 
including the ATG initiation codon should be provided. Furthermore, the initiation codon 

30 should be in the correct reading frame to ensure translation of the entire insert. Exogenous 
translational elements and initiation codons may be of various origins, both natural and 
synthetic. The efficiency of expression may be enhanced by the inclusion of enhancers 
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appropriate for the particular cell system used. (See, e.g., Scharf. D. et al. (1994) Results 
Probl. Cell Differ. 20:125-162.) 

In addition, a host cell strain may be chosen for its ability to modulate expression of 
the inserted sequences or to process the expressed protein in the desired fashion. Such 
5 modifications of the polypeptide include, but are not limited to, acetylat.on, carboxvlation, 
glycosylate, phosphorylation, lipidation, and acylation. Post-translational processing 
which cleaves a "prepro" form of the protein may also be used to facilitate correct 
insertion, folding, and/or function. Different host cells which have specific cellular 
machinery and characteristic mechanisms for post-translational activities (e.g., CHO, 

10 HeLa, MDCK, HEK293, and WI38), are available from the American Type Culture 

Collection (ATCC, Bethesda, MD) and may be chosen to ensure the correct modification 
and processing of the foreign protein. 

For long term, high yield production of recombinant proteins, stable expression is 
preferred. For example, cell lines capable of stably expressing SIGP can be transformed 

1 5 using expression vectors which may contain viral origins of replication and/or endogenous 
expression elements and a selectable marker gene on the same or on a separate vector. 
Following the introduction of the vector, cells may be allowed to grow for about 1 to 2 
days in enriched media before being switched to selective media. The purpose of the 
selectable marker is to confer resistance to selection, and its presence allows growth and 

20 recovery of cells which successfully express the introduced sequences. Resistant clones of 
stably transformed cells may be proliferated using tissue culture techniques appropriate to 
the cell type. 

Any number of selection systems may be used to recover transformed cell lines. 
These include, but are not limited to, the herpes simplex virus thymidine kinase genes and 
25 adenine phosphoribosyltransferase genes, which can be employed in tk or apr cells, 
respectively. (See, e.g., Wigler, M. et al. (1977) Cell 1 1:223-232; and Lowy, I. et al. 

( 1 980) Cell 22:8 1 7-823) Also, antimetabolite, antibiotic, or herbicide resistance can be 
used as the basis for selection. For example, dhfr confers resistance to methotrexate: npt 
confers resistance to the aminoglycosides neomycin and G-418; and als or pat confer 

30 resistance to chlorsulfuron and phosphinotricin acetyltransferase, respectively. (See. e.g., 
Wigler, M. et al. (1980) Proc. Natl. Acad. Sci. 77:3567-3570; Colbere-Garapm, F. et a 1 . 

(1981) J. Mol. Biol. 150:1-14; and Murry. supra.) Additional selectable genes have been 
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described, e.g., trpB, which allows cells to utilize indole in place of tryptophan, or hisD, 
which allows cells to utilize histinol in place of histidine. (See, e.g., Hartman, S.C. and 
R.C. Mulligan (1988) Proc. Natl. Acad. Sci. 85:8047-805 1.) Recently, the use of visible 
markers has gained popularity with such markers as anthocyanins, 13 glucuronidase and its 
5 substrate GUS, luciferase and its substrate luciferin. Green fluorescent proteins (GFP) 
(Clontech, Palo Alto, CA) are also used (See, e.g., Chalfie, M. et al. ( 1 994) Science 
263:802-805.) These markers can be used not only to identify transformants, but also to 
quantify the amount of transient or stable protein expression attributable to a specific 
vector system. (See, e.g., Rhodes, CA. et al. (1995) Methods Mol. Biol. 55:121-131.) 

l o Although the presence/absence of marker gene expression suggests that the gene of 
interest is also present, the presence and expression of the gene may need to be confirmed. 
For example, if the sequence encoding SIGP is inserted within a marker gene sequence, 
transformed cells containing sequences encoding SIGP can be identified by the absence of 
marker gene function. Alternatively, a marker gene can be placed in tandem with a 

15 sequence encoding SIGP under the control of a single promoter. Expression of the marker 
gene in response to induction or selection usually indicates expression of the tandem gene 
as well. 

Alternatively, host cells which contain the nucleic acid sequence encoding SIGP and 
express SIGP may be identified by a variety of procedures known to those of skill in the 
20 art. These procedures include, but are not limited to, DNA-DNA or DNA-RNA 

hybridizations and protein bioassay or immunoassay techniques which include membrane, 
solution, or chip based technologies for the detection and/or quantification of nucleic acid 

or protein sequences. 

The presence of polynucleotide sequences encoding SIGP can be detected by 
25 DNA-DNA or DNA-RNA hybridization or amplification using probes or fragments or 

fragments of polynucleotides encoding SIGP. Nucleic acid amplification based assays 

involve the use of oligonucleotides or oligomers based on the sequences encoding SIGP to 

detect transformants containing DNA or RNA encoding SIGP. 

A variety of protocols for detecting and measuring the expression of SIGP. using 
30 either polyclonal or monoclonal antibodies specific for the protein, are known in the art. 

Examples of such techniques include enzyme-linked immunosorbent assays (ELISAs). 

radioimmunoassays (RIAs), and fluorescence activated cell sorting (FACS). A two-site. 
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monoclonal-based immunoassay utilizing monoclonal antibodies reactive to two 
non-interfering epitopes on SIGP is preferred, but a competitive binding assay may be 
employed. These and other assays are well described in the an. (See, e.g., Hampton, R. et 
al. (1990) Serological Methods, a Laboratory Manual . APS Press, St Paul, MN, Section 
5 IV; and Maddox, D.E. et al. (1983) J. Exp. Med. 158:121 1-1216). 

A wide variety of labels and conjugation techniques are known by those skilled in the 
art and may be used in various nucleic acid and amino acid assays. Means for producing 
labeled hybridization or PCR probes for detecting sequences related to polynucleotides 
encoding SIGP include oligolabeling, nick translation, end-labeling, or PCR amplification 

10 using a labeled nucleotide. Alternatively, the sequences encoding SIGP, or any fragments 
thereof, may be cloned into a vector for the production of an mRNA probe. Such vectors 
are known in the art, are commercially available, and may be used to synthesize RNA 
probes in vitro by addition of an appropriate RNA polymerase such as T7, T3, or SP6 and 
labeled nucleotides. These procedures may be conducted using a variety of commercially 

] 5 available kits, such as those provided by Pharmacia & Upjohn (Kalamazoo, Ml), Promega 
(Madison, WI), and U.S. Biochemical Corp. (Cleveland, OH). Suitable reporter molecules 
or labels which may be used for ease of detection include radionuclides, enzymes, 
fluorescent, chemiluminescent, or chromogenic agents, as well as substrates, cofactors, 
inhibitors, magnetic particles, and the like. 

20 Host cells transformed with nucleotide sequences encoding SIGP may be cultured 

under conditions suitable for the expression and recovery of the protein from cell culture. 
The prbtein produced by a transformed cell may be secreted or contained intracellularly 
depending on the sequence and/or the vector used. As will be understood by those of skill 
in the art, expression vectors containing polynucleotides which encode SIGP may be 

25 designed to contain signal sequences which dtrect secretion of SIGP through a prokaryotic 
or eukaryotic cell membrane. Other constructions may be used to join sequences encoding 
SIGP to nucleotide sequences encoding a polypeptide domain which will facilitate 
purification of soluble proteins. Such purification facilitating domains include, but are not 
limited to, metal chelating peptides such as histidine-tryptophan modules that allow 

30 purification on immobilized metals, protein A domains that allow purification on 

immobilized immunoglobulin, and the domain utilized in the FLAGS extension/affinity 
pur.tication system (Immunex Corp., Seattle. WA). The inclusion of cleavable linker 
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sequences, such as those specific for Factor XA or enterokinase (Invitrogen, San Diego, 
CA), between the purification domain and the SIGP encoding sequence may be used to 
facilitate purification. One such expression vector provides for expression of a fusion 
protein containing SIGP and a nucleic acid encoding 6 histidine residues preceding a 
5 thioredoxin or an enterokinase cleavage site. The histidine residues facilitate purification 
on immobilized metal ion affinity chromatography. (IMAC) (See, e.g., Porath, J. et al. 
(1992) Prot. Exp. Purif. 3: 263-281.) The enterokinase cleavage site provides a means for 
purifying SIGP from the fusion protein. (See, e.g., Kroll, D.J. et al. (1993) DNA Cell 
Biol. 12:441-453.) 

1 0 Fragments of SIGP may be produced not only by recombinant production, but also by 
direct peptide synthesis using solid-phase techniques. (See, e.g., Creighton, T.E. (1984) 
Protein: Structures and Molecular Properties, pp. 55-60, W.H. Freeman and Co., New 
York, NY.) Protein synthesis may be performed by manual techniques or by automation. 
Automated synthesis may be achieved, for example, using the Applied Biosystems 431 A 

15 Peptide Synthesizer (Perkin Elmer). Various fragments of SIGP may be synthesized 
separately and then combined to produce the full length molecule. 

THERAPEUTICS 

The expression of the human signal peptide-containing proteins of the invention 
20 (SIGP) is closely associated with cell proliferation. Therefore, in cancers or immune 
response where SIGP is an activator, transcription factor, or enhancer, and is promoting 
cell proliferation, it is desirable to decrease the expression of SIGP. In conditions where 
SIGP is an inhibitor or suppressor and is controlling or decreasing cell proliferation, it is 
desirable to provide the protein or to increase the expression of SIGP. 
25 In one embodiment, where SIGP is an inhibitor, SIGP or a fragment or derivative 

thereof may be administered to a subject to treat or prevent a cancer such as 
adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, and 
teratocarcinoma. Such cancers include, but are not limited to, cancers of the adrenal gland, 
bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, gastrointestinal 
30 tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, penis, prostate, 
salivary glands, skin, spleen, testis, thymus, thyroid, and uterus. 



In another embodiment, a pharmaceutical composition comprising purified SIGP may 





WO 99/33981 



PCT/US98/27598 



be used to treat or prevent a cancer including, but not limited to, those listed above. 

In another embodiment, an agonist which is specific for SIGP may be administered to 
a subject to treat or prevent a cancer including, but not limited to, those cancers listed 
above. 



a derivative thereof, may be administered to a subject to treat or prevent a cancer 
including, but not limited to, those cancers listed above. 

In a further embodiment where SIGP is promoting cell proliferation, antagonists 
which decrease the expression or activity of SIGP may be administered to a subject to treat 
JO or prevent a cancer such as adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, 
sarcoma, and teratocarcinoma. Such cancers include, but are not limited to, cancers of the 
adrenal gland, bladder, bone, bone marrow, brain, breast, cervix, gall bladder, ganglia, 
gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary, pancreas, parathyroid, 
penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus. In one 
15 aspect, antibodies which specifically bind SIGP may be used directly as an antagonist or 
indirectly as a targeting or delivery mechanism for bringing a pharmaceutical agent to cells 
or tissue which express SIGP. 

In another embodiment, a vector expressing the complement of the polynucleotide 
encoding SIGP may be administered to a subject to treat or prevent a cancer including, but 
20 not limited to, those cancers listed above. 

In yet another embodiment where SIGP is promoting leukocyte activity or 
proliferation, antagonists which decrease the activity of SIGP may be administered to a 
subject to treat or prevent an immune response. Such responses include, but are not 
limited to, disorders such as AIDS, Addison's disease, adult respiratory distress syndrome, 
25 allergies, anemia, asthma, atherosclerosis, bronchitis, cholecystitus, Crohn's disease, 
ulcerative colitis, atopic dermatitis, dermatomyositis, diabetes rnellitus, emphysema, 
atrophic gastritis, glomerulonephritis, gout, Graves' disease, hypereosinophilia, irritable 
bowel syndrome, lupus erythematosus, multiple sclerosis, myasthenia gravis, myocardial 
or pericardial inflammation, osteoarthritis, osteoporosis, pancreatitis, polymyositis, 
30 rheumatoid arthritis, scleroderma, Sjogren's syndrome, and autoimmune thyroiditis; 

complications of cancer, hemodialysis, extracorporeal circulation: viral, bacterial, fungal, 
parasitic, protozoal, and helminthic infections; and trauma. In one aspect, antibodies 



5 



In another further embodiment, a vector capable of expressing SIGP, or a fragment or 
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which specifically bind SIGP may be used directly as an antagonist or indirectly as a 
targeting or delivery mechanism for bringing a pharmaceutical agent to cells or tissue 
which express SIGP. 

In another embodiment, a vector expressing the complement of the polynucleotide 
5 encodins SIGP may be administered to a subject to treat or prevent an immune response 
including, but not limited to, those listed above. 

In other embodiments, any of the proteins, antagonists, antibodies, agonists, 
complementary sequences, or vectors of the invention may be administered in combination 
with other appropriate therapeutic agents. Selection of the appropriate agents for use in 
10 combination therapy may be made by one of ordinary skill in the art, according to 

conventional pharmaceutical principles. The combination of therapeutic agents may act 
synergistically to effect the treatment or prevention of the various disorders described 
above. Using this approach, one may be able to achieve therapeutic efficacy with lower 
dosages of each agent, thus reducing the potential for adverse side effects. 
1 5 An antagonist of SIGP may be produced using methods which are generally known in 

the art. In particular, purified SIGP may be used to produce antibodies or to screen 
libraries of pharmaceutical agents to identify those which specifically bind SIGP. 
Antibodies to SIGP may also be generated using methods that are well known in the art. 
Such antibodies may include, but are not limited to, polyclonal, monoclonal, chimeric, and 
20 single chain antibodies, Fab fragments, and fragments produced by a Fab expression 

library. Neutralizing antibodies (i.e., those which inhibit dimcr formation) are especially 
preferred for therapeutic use. 

For the production of antibodies, various hosts including goats, rabbits, rats, mice, 
humans, and others may be immunized by injection with SIGP or with any fragment or 
25 oligopeptide thereof which has immunogenic properties. Depending on the host species, 
various adjuvants may be used to increase immunological response. Such adjuvants 
include, but are not limited to, Freund's, mineral gels such as aluminum hydroxide, and 
surface active substances such as lysolecithin, pluronic polyols, polyamons, peptides, oil 
emulsions, KLH, and dinitrophenol. Among adjuvants used in humans, BCG (bacilli 
30 Calmette-Guerin) and Corvnebactenum narvum are especially preferable. 

It is preferred that the oligopeptides, peptides, or fragments used to induce antibodies 
to SIGP have an amino acid sequence consisting of at least abou; 5 amino acids, and. more 
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preferably, of at least about 10 ammo acids. It is also preferable that these oligopeptides, 
peptides, or fragments are identical to a portion of the amino acid sequence of the natural 
protein and contain the entire amino acid sequence of a small, naturally occurring 
molecule. Short stretches of SIGP amino acids may be fused with those of another 
5 protein, such as KLH, and antibodies to the chimeric molecule may be produced. 

Monoclonal antibodies to SIGP may be prepared using any technique which provides 
for the production of antibody molecules by continuous cell lines in culture. These 
include, but are not limited to, the hybridoma technique, the human B-cell hybridoma 
technique, and the EBV-hybridoma technique. (See, e.g., Kohler, G. et at. (1975) Nature 
10 256:495-497; Kozbor, D. et al. (1985) J. Immunol. Methods 81:31-42; Cote, R.J. et al. 
(1983) Proc. Natl. Acad. Sci. 80:2026-2030; and Cole, S.P. et al. (1984) Mol. Cell Biol. 
62:109-120.) 

In addition, techniques developed for the production of "chimeric antibodies," such as 
the splicing of mouse antibody genes to human antibody genes to obtain a molecule with 

15 appropriate antigen specificity and biological activity, can be used. (See, e.g., Morrison, 
S.L. et al. (1984) Proc. Natl. Acad. Sci. 81:6851-6855; Neuberger, M.S. et al. (1984) 
Nature 312:604-608; and Takeda, S. et al. (1985) Nature 314:452-454.) Alternatively, 
techniques described for the production of single chain antibodies may be adapted, using 
methods known in the art, to produce SIGP-specific single chain antibodies. Antibodies 

20 with related specificity, but of distinct idiotypic composition, may be generated by chain 
shuffling from random combinatorial immunoglobulin libraries. (See. e.g., Burton D.R. 
(1991) Proc. Natl. Acad. Sci. 88:10134-10137.) 

Antibodies may also be produced by inducing in vivo production in the lymphocyte 
population or by screening immunoglobulin libraries or panels of highly specific binding 

25 reagents as disclosed in the literature. (See, e.g., Orlandi, R. et al. ( 1 989) Proc. Natl. Acad. 
Sci. 86: 3833-3837; and Winter, G. et al. (1991) Nature 349:293-299.) 

Antibody fragments which contain specific binding sites for SIGP may also be 
generated. For example, such fragments include, but are not limited to, F(ab')2 fragments 
produced by pepsin digestion of the antibody molecule and Fab fragments generated by 

30 reducing the disulfide bridges of the F(ab')2 fragments. Alternatively, Fab expression 
libraries may be constructed to allow rapid and easy identification of monoclonal Fab 
fragments with the desired specificity. (See, e.g., Huse. W.D. et al. (1989) Science 
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246:1275-1281.) 

Various immunoassays may be used for screening to identify antibodies having the 
desired specificity. Numerous protocols for competitive binding or immunoradiometric 
assays using either polyclonal or monoclonal antibodies with established specificities are 
5 well known in the art. Such immunoassays typically involve the measurement of complex 
formation between S1GP and its specific antibody. A two-site, monoclonal-based 
immunoassay utilizing monoclonal antibodies reactive to two non-interfering SIGP 
epitopes is preferred, but a competitive binding assay may also be employed. (Maddox, 
supra .) 

10 In another embodiment of the invention, the polynucleotides encoding SIGP, or any 
fragment or complement thereof, may be used for therapeutic purposes. In one aspect, the 
complement of the polynucleotide encoding SIGP may be used in situations in which it 
would be desirable to block the transcription of the mRNA. In particular, cells may be 
transformed with sequences complementary to polynucleotides encoding SIGP. Thus, 

15 complementary molecules or fragments may be used to modulate SIGP activity, or to 
achieve regulation of gene function. Such technology is now well known in the art, and 
sense or antisense oligonucleotides or larger fragments can be designed from various 
locations along the coding or control regions of sequences encoding SIGP. 

Expression vectors derived from retroviruses, adenoviruses, or herpes or vaccinia 

20 viruses, or from various bacterial plasmids, may be used for delivery of nucleotide 

sequences to the targeted organ, tissue, or cell population. Methods which are well known 
to those skilled in the art can be used to construct vectors which will express nucleic acid 
sequences complementary to the polynucleotides of the gene encoding SIGP. (See, e.g., 
Sambrook, supra: and Ausubel, supra-) 

25 Genes encoding SIGP can be turned off by transforming a cell or tissue with 

expression vectors which express high levels of a polynucleotide, or fragment thereof, 
encoding SIGP. Such constructs may be used to introduce untranslatable sense or 
antisense sequences into a cell. Even in the absence of integration into the DNA, such 
vectors may continue to transcribe RNA molecules until they are disabled by endogenous 

30 nucleases. Transient expression may last for a month or more with a non-replicating 
vector, and may last even longer if appropriate replication elements are pan of the vector 
system. 
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As mentioned above, modifications of gene expression can be obtained by designing 
complementary sequences or antisense molecules (DNA, RNA, or PNA) to the control, 5\ 
or regulatory regions of the gene encoding SIGP. Oligonucleotides derived from the 
transcription initiation site, e.g., between about positions -10 and +10 from the start site, 

5 are preferred. Similarly, inhibition can be achieved using triple helix base-pairing 

methodology. Triple helix pairing is useful because it causes inhibition of the ability of 
the double helix to open sufficiently for the binding of polymerases, transcription factors, 
or regulatory molecules. Recent therapeutic advances using triplex DNA have been 
described in the literature. (See, e.g., Gee, J.E. et al. (1994) in Huber, B.E. and B.I. Can*, 

, o Molecular and Immunologic Approaches. Futura Publishing Co., Mt. Kisco, NY, pp. 1 63- 
1 77.) A complementary sequence or antisense molecule may also be designed to block 
translation of mRNA by preventing the transcript from binding to ribosomes. 

Ribozymes, enzymatic RNA molecules, may also be used to catalyze the specific 
cleavage of RNA. The mechanism of ribozyme action involves sequence-specific 

15 hybridization of the ribozyme molecule to complementary target RNA, followed by 
endonucleolytic cleavage. For example, engineered hammerhead motif ribozyme 
molecules may specifically and efficiently catalyze endonucleolytic cleavage of sequences 
encoding SIGP. 

Specific ribozyme cleavage sites within any potential RNA target are initially 
20 identified by scanning the target molecule for ribozyme cleavage sites, including the 
following sequences: GUA, GUU, and GUC. Once identified, short RNA sequences of 
between 15 and 20 ribonucleotides, corresponding to the region of the target gene 
containing the cleavage site, may be evaluated for secondary structural features which may 
render the oligonucleotide inoperable. The suitability of candidate targets may also be 
25 evaluated by testing accessibility to hybridization with complementary oligonucleotides 
using ribonuclease protection assays. 

Complementary ribonucleic acid molecules and ribozymes of the invention may be 
prepared by any method known in the art for the synthesis of nucleic acid molecules. 
These include techniques for chemically synthesizing oligonucleotides such as solid phase 
30 phosphoramidite chemical synthesis. Alternatively, RNA molecules may be generated by 
in vitro and in vivo transcription of DNA sequences encoding SIGP. Such DNA 
sequences may be incorporated into a wide variety of vectors with suitable RNA 
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polymerase promoters such as T7 or SP6. Alternatively, these cDNA constructs that 
synthesize complementary RNA, constitutively or inducibly, can be introduced into cell 

lines, cells, or tissues. 

RNA molecules may be modified to increase intracellular stability and half-life. 

5 Possible modifications include, but are not limited to, the addition of flanking sequences at 
the 5' and/or 3' ends of the molecule, or the use of phosphorothioate or 2' O-methyl rather 
than phosphodiesterase linkages within the backbone of the molecule. This concept is 
inherent in the production of PNAs and can be extended in all of these molecules by the 
inclusion of nontraditional bases such as inosine, queosine, and wybutosine, as well as 

l o acetyl-, methyl-, thio-, and similarly modified forms of adenine, cytidine, guanine, 
thymine, and uridine which are not as easily recognized by endogenous endonucleases. 

Many methods for introducing vectors into cells or tissues are available and equally 
suitable for use in vivo , in vitro , and ex vivo . For ex vivo therapy, vectors may be 
introduced into stem cells taken from the patient and clonally propagated for autologous 

1 5 transplant back into that same patient. Delivery by transfection, by liposome injections, or 
by polycationic amino polymers may be achieved using methods which are well known in 
the art. (See, e.g., Goldman, C.K. et al. (1997) Nature Biotechnology 15:462-466.) 

Any of the therapeutic methods described above may be applied to any subject in 
need of such therapy, including, for example, mammals such as dogs, cats, cows, horses, 

20 rabbits, monkeys, and most preferably, humans. 

An additional embodiment of the invention relates to the administration of a 
pharmaceutical or sterile composition, in conjunction with a pharmaceutical ly acceptable 
carrier, for any of the therapeutic effects discussed above. Such pharmaceutical 
compositions may consist of SIGP, antibodies to SIGP, and mimetics, agonists, 

25 antagonists, or inhibitors of SIGP. The compositions may be administered alone or in 
combination with at least one other agent, such as a stabilizing compound, which may be 
administered in any sterile, biocompatible pharmaceutical carrier including, but not limited 
to, saline, buffered saline, dextrose, and water. The compositions may be administered to a 
patient alone, or in combination with other agents, drugs, or hormones. 

30 The pharmaceutical compositions utilized in this invention may be administered by 

any number of routes including, but not limited to. oral, intravenous, intramuscular, 
intra-arterial, intramedullary, intrathecal, intraventricular, transdermal, subcutaneous. 
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intraperitoneal, intranasal, enteral, topical, sublingual, or rectal means. 

In addition to the active ingredients, these pharmaceutical compositions may contain 
suitable pharmaceutically-acceptable carriers comprising excipients and auxiliaries which 
facilitate processing of the active compounds into preparations which can be used 
5 pharmaceutical^. Further details on techniques for formulation and administration may 
be found in the latest edition of Remington's Pharmaceutical Sciences (Maack Publishing 
Co., Easton, PA). 

Pharmaceutical compositions for oral administration can be formulated using 

pharmaceutical^ acceptable carriers well known in the art in dosages suitable for oral 
10 administration. Such carriers enable the pharmaceutical compositions to be formulated as 

tablets, pills, dragees, capsules, liquids, gels, syrups, slurries, suspensions, and the like, for 

ingestion by the patient. 

Pharmaceutical preparations for oral use can be obtained through combining active 

compounds with solid excipient and processing the resultant mixture of granules 
1 5 (optionally, after grinding) to obtain tablets or dragee cores. Suitable auxiliaries can be 

added, if desired. Suitable excipients include carbohydrate or protein fillers, such as 

sugars, including lactose, sucrose, mannitol, and sorbitol; starch from corn, wheat, rice, 

potato, or other plants; cellulose, such as methyl cellulose, 

hydroxypropylmethyl-cellulose, or sodium carboxymethylcellulose; gums, including 
20 arabic and tragacanth; and proteins, such as gelatin and collagen. If desired, disintegrating 

or solubilizing agents may be added, such as the cross-linked polyvinyl pyrrolidone, agar, 

and alginic acid or a salt thereof, such as sodium alginate. 

Dragee cores may be used in conjunction with suitable coatings, such as concentrated 

sugar solutions, which may also contain gum arabic, talc, polyvinylpyrrolidone, carbopol 
25 gel, polyethylene glycol, and/or titanium dioxide, lacquer solutions, and suitable organic 

solvents or solvent mixtures. Dyestuffs or pigments may be added to the tablets or dragee 

coatings for product identification or to characterize the quantity of active compound, i.e., 

dosage. 

Pharmaceutical preparations which can be used orally include push-fit capsules made 
30 of gelatin, as well as soft, sealed capsules made of gelatin and a coating, such as glycerol 
or sorbitol. Push-fit capsules can contain active ingredients mixed with fillers or binders, 
such as lactose or starches, lubricants, such as talc or magnesium stearate, and, optionally. 
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stabilizers. In soft capsules, the active compounds may be dissolved or suspended in 
suitable liquids, such as fatty oils, liquid, or liquid polyethylene glycol with or without 
stabilizers. 

Pharmaceutical formulations suitable for parenteral administration may be formulated 
5 in aqueous solutions, preferably in physiologically compatible buffers such as Hanks's 
solution, Ringer's solution, or physiologically buffered saline. Aqueous injection 
suspensions may contain substances which increase the viscosity of the suspension, such 
as sodium carboxymethyl cellulose, sorbitol, or dextran. Additionally, suspensions of the 
active compounds may be prepared as appropriate oily injection suspensions. Suitable 

10 lipophilic solvents or vehicles include fatty oils, such as sesame oil, or synthetic fatty acid 
esters, such as ethyl oleate, triglycerides, or liposomes. Non-lipid polycationic amino 
polymers may also be used for delivery. Optionally, the suspension may also contain 
suitable stabilizers or agents to increase the solubility of the compounds and allow for the 
preparation of highly concentrated solutions. 

i 5 For topical or nasal administration, penetrants appropriate to the particular barrier to 
be permeated are used in the formulation. Such penetrants are generally known in the art. 

The pharmaceutical compositions of the present invention may be manufactured in a 
manner that is known in the art, e.g., by means of conventional mixing, dissolving, 
granulating, dragee-making, levigating, emulsifying, encapsulating, entrapping, or 

20 lyophilizing processes. 

The pharmaceutical composition may be provided as a salt and can be formed with 

many acids, including but not limited to, hydrochloric, sulfuric, acetic, lactic, tartaric, 

malic, and succinic acid. Salts tend to be more soluble in aqueous or other protonic 

solvents than are the corresponding free base forms. In other cases, the preferred 
25 preparation may be a lyophilized powder which may contain any or all of the following: 1 

mM to 50 mM histidine, 0.1% to 2% sucrose, and 2% to 7% mannitol, at a pH range of 4.5 

to 5.5, that is combined with buffer prior to use. 

After pharmaceutical compositions have been prepared, they can be placed in an 

appropriate container and labeled for treatment of an indicated condition. For 
30 administration of SIGP, such labeling would include amount, frequency, and method of 

administration. 

Pharmaceutical compositions suitable for use in the invention include compositions 
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wherein the active ingredients are contained in an effective amount to achieve the intended 
purpose. The determination of an effective dose is well within the capability of those 
skilled in the art. 

For any compound, the therapeutically effective dose can be estimated initially either 
5 in cell culture assays, e.g., of neoplastic cells or in animal models such as mice, rats, 
rabbits, dogs, or pigs. An animal model may also be used to determine the appropriate 
concentration range and route of administration. Such information can then be used to 
determine useful doses and routes for administration in humans. 

A therapeutically effective dose refers to that amount of active ingredient, for 
1 o example SIGP or fragments thereof, antibodies of SIGP, and agonists, antagonists or 
inhibitors of SIGP, which ameliorates the symptoms or condition. Therapeutic efficacy 
and toxicity may be determined by standard pharmaceutical procedures in cell cultures or 
with experimental animals, such as by calculating the ED50 (the dose therapeutically 
effective in 50% of the population) or LD50 (the dose lethal to 50% of the population) 
15 statistics. The dose ratio of therapeutic to toxic effects is the therapeutic index, and it can 
be expressed as the ED50/LD50 ratio. Pharmaceutical compositions which exhibit large 
therapeutic indices are preferred. The data obtained from cell culture assays and animal 
studies are used to formulate a range of dosage for human use. The dosage contained in 
such compositions is preferably within a range of circulating concentrations that includes 
20 the ED50 with little or no toxicity. The dosage varies within this range depending upon 
the dosage form employed, the sensitivity of the patient, and the route of administration. 

The exact dosage will be determined by the practitioner, in light of factors related to 
the subject requiring treatment. Dosage and administration are adjusted to provide 
sufficient levels of the active moiety or to maintain the desired effect. Factors which may 
25 be taken into account include the severity of the disease state, the general health of the 
subject, the age, weight, and gender of the subject, time and frequency of administration, 
drug combination(s), reaction sensitivities, and response to therapy. Long-acting 
pharmaceutical compositions may be administered every 3 to 4 days, every week, or 
biweekly depending on the half-life and clearance rate of the particular formulation. 

Normal dosage amounts may vary from about 0.1 to 100,000 ug. up to a total dose 
of about 1 gram, depending upon the route of administration Guidance as to particular 
dosages and methods of delivery is provided in the literanire and generally available to 
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practitioners in the art. Those skilled in the art will employ different formulations for 
nucleotides than for proteins or their inhibitors. Similarly, delivery of polynucleotides or 
polypeptides will be specific to particular cells, conditions, locations, etc. 

5 DIAGNOSTICS 

In another embodiment, antibodies which specifically bind SIGP may be used for the 
diagnosis of disorders characterized by expression of SIGP, or in assays to monitor 
patients being treated with SIGP or agonists, antagonists, or inhibitors of SIGP. 
Antibodies useful for diagnostic purposes may be prepared in the same manner as 
10 described above for therapeutics. Diagnostic assays for SIGP include methods which 
utilize the antibody and a label to detect SIGP in human body fluids or in extracts of cells 
or tissues. The antibodies may be used with or without modification, and may be labeled 
by covalent or non-covalent attachment of a reporter molecule. A wide variety of reporter 
molecules, several of which are described above, are known in the art and may be used. 
1 5 A variety of protocols for measuring SIGP, including ELISAs, RIAs, and FACS, are 
known in the art and provide a basis for diagnosing altered or abnormal levels of SIGP 
expression. Normal or standard values for SIGP expression are established by combining 
body fluids or cell extracts taken from normal mammalian subjects, preferably human, 
with antibody to SIGP under conditions suitable for complex formation The amount of 
20 standard complex formation may be quantitated by various methods, preferably by 

photometric means. Quantities of SIGP expressed in subject, control, and disease samples 
from biopsied tissues are compared with the standard values. Deviation between standard 
and subject values establishes the parameters for diagnosing disease. 

In another embodiment of the invention, the polynucleotides encoding SIGP may be 
25 used for diagnostic purposes. The polynucleotides which may be used include 

oligonucleotide sequences, complementary RNA and DNA molecules, and PNAs. The 
polynucleotides may be used to detect and quantitate gene expression in biopsied tissues in 
which expression of SIGP may be correlated with disease. The diagnostic assay may be 
used to determine absence, presence, and excess expression of SIGP. and to monitor 
30 regulation of SIGP levels dunng therapeutic intervention. 

In one aspect, hybridization with PCR probes which are capable of detecting 
polynucleotide sequences, including genomic sequences, encoding SIGP or closely related 
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molecules may be used to identify nucleic acid sequences which encode SIGP. The 
specificity of the probe, whether it is made from a highly specific region, e.g., the 5' 
regulatory region, or from a less specific region, e.g., a conserved motif, and the 
stringency of the hybridization or amplification (maximal, high, intermediate, or low), will 

5 determine whether the probe identifies only naturally occurring sequences encoding SIGP, 
alleles, or related sequences. 

Probes may also be used for the detection of related sequences, and should preferably 
contain at least 50% of the nucleotides from any of the SIGP encoding sequences. The 
hybridization probes of the subject invention may be DNA or RNA and may be denved 

, o from the sequence of SEQ ID NO: 1 6, SEQ ID NO: 1 7, SEQ ID NO: 1 8, SEQ ID NO: 1 9, 
SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ ID NO:24, SEQ 
ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID NO:29, and SEQ ID 
NO:30, or from genomic sequences including promoters, enhancers, and introns of the 
SIGP gene. 

15 Means for producing specific hybridization probes for DNAs encoding SIGP include 

the cloning of polynucleotide sequences encoding SIGP or SIGP derivatives into vectors 
for the production of mRNA probes. Such vectors are known in the art, are commercially 
available, and may be used to synthesize RNA probes in_yilm by means of the addition of 
the appropriate RNA polymerases and the appropriate labeled nucleotides. Hybridization 

20 probes may be labeled by a variety of reporter groups, for example, by radionuclides such 
as 32 P or 35 S, or by enzymatic labels, such as alkaline phosphatase coupled to the probe via 
avidin/biotin coupling systems, and the like. 

Polynucleotide sequences encoding SIGP may be used for the diagnosis of a disorder 
associated with either increased or decreased expression of SIGP. Examples of such a 

25 disorder include, but are not limited to, cancers such as adenocarcinoma, leukemia, 
lymphoma, melanoma, myeloma, sarcoma, teratocarcinoma, and cancers of the adrenal 
gland, bladder, bone, brain, breast, cervix, gall bladder, ganglia, gastrointestinal tract, 
heart, kidney, liver, lung, bone marrow, muscle, ovary, pancreas, parathyroid, penis, 
prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and uterus; neuronal 

30 disorders such as akathesia. Alzheimer's disease, amnesia, amyotrophic lateral sclerosis, 
bipolar disorder, catatonia, cerebral neoplasms, dementia, depression, Down's syndrome, 
tardive dyskinesia, dystonias, epilepsy, Huntington's disease, multiple sclerosis. 
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neurofibromatosis, Parkinson's disease, paranoid psychoses, schizophrenia, and Tourette's 
disorder; and immunological disorders such as AIDS, Addison's disease, adult respiratory 
distress syndrome, allergies, anemia, asthma, atherosclerosis, bronchitis, cholecystitus, 
Crohn's disease, ulcerative colitis, atopic dermatitis, dermatomyositis, diabetes mellitus, 
5 emphysema, atrophic gastritis, glomerulonephritis, gout, Graves' disease, 

hypereosinophilia, irritable bowel syndrome, lupus erythematosus, multiple sclerosis, 
myasthenia gravis, myocardial or pericardial inflammation, osteoarthritis, osteoporosis, 
pancreatitis, polymyositis, rheumatoid arthritis, scleroderma, Sjogren's syndrome, and 
thyroiditis. The polynucleotide sequences encoding S1GP may be used in Southern or 
10 northern analysis, dot blot, or other membrane-based technologies; in PCR technologies; 
in dipstick, pin, and ELISA assays; and in microarrays utilizing fluids or tissues from 
patients to detect altered SIGP expression. Such qualitative or quantitative methods are 

well known in the art. 

In a particular aspect, the nucleotide sequences encoding SIGP may be useful in 

1 5 assays that detect the presence of associated disorders, particularly those mentioned above. 
The nucleotide sequences encoding SIGP may be labeled by standard methods and added 
to a fluid or tissue sample from a patient under conditions suitable for the formation of 
hybridization complexes. After a suitable incubation period, the sample is washed and the 
signal is quantitated and compared with a standard value. If the amount of signal in the 

20 patient sample is significantly altered in comparison to a control sample then the presence 
of altered levels of nucleotide sequences encoding SIGP in the sample indicates the 
presence of the associated disorder. Such assays may also be used to evaluate the efficacy 
of a particular therapeutic treatment regimen in animal studies, in clinical trials, or to 
monitor the treatment of an individual patient. 

25 In order to provide a basis for the diagnosis of a disorder associated with expression 

of SIGP, a normal or standard profile for expression is established. This may be 
accomplished by combining body fluids or cell extracts taken from normal subjects, either 
animal or human, with a sequence, or a fragment thereof encoding SIGP, under conditions 
suitable for hybridization or amplification. Standard hybridization may be quantified by 

30 comparing the values obtained from normal subjects with values from an experiment in 
which a known amount of a substantially punfied polynucleotide is used. Standard values 
obtained in this manner may be compared with values obtained from samples from 
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patients who are symptomatic for a disorder. Deviation from standard values is used to 
establish the presence of a disorder. 

Once the presence of a disorder is established and a treatment protocol is initiated, 
hybridization assays may be repeated on a regular basis to determine if the level of 
5 expression in the patient begins to approximate that which is observed in the normal 
subject. The results obtained from successive assays may be used to show the efficacy of 
treatment over a period ranging from several days to months. 

With respect to cancer, the presence of a relatively high amount of transcript in 
biopsied tissue from an individual may indicate a predisposition for the development of 
10 the disease, or may provide a means for detecting the disease prior to the appearance of 
actual clinical symptoms. A more definitive diagnosis of this type may allow health 
professionals to employ preventative measures or aggressive treatment earlier thereby 
preventing the development or further progression of the cancer. 

Additional diagnostic uses for oligonucleotides designed from the sequences encoding 
15 SIGP may involve the use of PCR. These oligomers may be chemically synthesized, 
generated enzymatically, or produced in vitro . Oligomers will preferably contain a 
fragment of a polynucleotide encoding SIGP, or a fragment of a polynucleotide 
complementary to the polynucleotide encoding SIGP, and will be employed under 
optimized conditions for identification of a specific gene or condition. Oligomers may 
20 also be employed under less stringent conditions for detection or quantitation of closely 
related DNA or RNA sequences. 

Methods which may also be used to quantitate the expression of SIGP include 
radiolabeling or biotinylating nucleotides, coamplification of a control nucleic acid, and 
interpolating results from standard curves. (See, e.g., Melby, P.C. et al. (1993) J. 
25 Immunol. Methods 159:235-244; and Duplaa, C. et al. (1993) Anal. Biochem. 229-236.) 
The speed of quantitation of multiple samples may be accelerated by running the assay in 
an ELISA format where the oligomer of interest is presented in various dilutions and a 
spectrophotometric or colonmetric response gives rapid quantitation. 

In further embodiments, oligonucleotides or longer fragments derived from any of the 
30 polynucleotide sequences described herein may be used as targets in a microarray. The 
microarray can be used to monitor the expression level of large numbers of genes 
simultaneously and to identify genetic variants, mutations, and polymorphisms. This 
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information may be used to determine gene function, to understand the genetic basis of a 
disorder, to diagnose a disorder, and to develop and monitor the activities of therapeutic 
agents. 

In one embodiment, the microarray is prepared and used according to methods known 
5 in the art. (See, e.g., Chee et al. (1995) PCT application W095/1 1995; Lockhart, D. J. et 
al. (1996) Nat. Biotech. 14:1675-1680; and Schena, M. et al. (1996) Proc. Natl. Acad. Sci. 
93:10614-10619.) 

The microarray is preferably composed of a large number of unique single-stranded 
nucleic acid sequences, usually either synthetic antisense oligonucleotides or fragments of 

10 cDNAs. The oligonucleotides are preferably about 6 to 60 nucleotides in length, more 
preferably about 15 to 30 nucleotides in length, and most preferably about 20 to 25 
nucleotides in length. It may be preferable to use oligonucleotides which are about 7 to 10 
nucleotides in length. The microarray may contain oligonucleotides which cover the 
known 5' or 3' sequence, sequential oligonucleotides which cover the full length sequence, 

1 5 or unique oligonucleotides selected from particular areas along the length of the sequence. 
Polynucleotides used in the microarray may be oligonucleotides specific to a gene or genes 
of interest. Oligonucleotides can also be specific to one or more unidentified cDNAs 
associated with a particular cell type or tissue type. It may be appropriate to use pairs of 
oligonucleotides on a microarray. The first oligonucleotide in each pair differs from the 

20 second oligonucleotide by one nucleotide. This nucleotide is preferably located in the 
center of the sequence. The second oligonucleotide serves as a control. The number of 
oligonucleotide pairs may range from about 2 to 1,000,000. 

In order to produce oligonucleotides for use on a microarray, the gene of interest is 
examined using a computer algorithm which starts at the 5' end, or, more preferably, at the 

25 3' end of the nucleotide sequence. The algorithm identifies oligomers of defined length 
that are unique to the gene, have a GC content within a range suitable for hybridization, 
and lack secondary structure that may interfere with hybridization. In one aspect, the 
oligomers may be synthesized on a substrate using a light-directed chemical process. (See. 
e.g., Chee et al., supra . ) The substrate may be any suitable solid support, e.g., paper. 

30 nylon, any other type of membrane, or a filter, chip, or glass slide. 

In another aspect, the oligonucleotides may be synthesized on the surface of the 
substrate using a chemical coupling procedure and an ink jet application apparatus. (See, 
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e.g., Baldeschweileret a!. (1 995) PCT application W095/251 1 16.) An array analogous to 
a dot or slot blot (HYBRIDOT® apparatus, GlBCO/BRL) may be used to arrange and link 
cDNA fragments or oligonucleotides to the surface of a substrate using a vacuum system 
or thermal, UV, mechanical, or chemical bonding procedures. An array may also be 
5 produced by hand or by using available devices, materials, and machines, e.g. 

Brinkmann® multichannel pipettors or robotic instruments. The array may contain from 2 
to 1,000,000 or any other feasible number of oligonucleotides. 

In order to conduct sample analysis using the microarrays, polynucleotides are 
extracted from a sample. The sample may be obtained from any bodily fluid, e.g., blood, 

1 0 urine, saliva, phlegm, gastric juices, cultured cells, biopsies, or other tissue preparations. 
To produce probes, the polynucleotides extracted from the sample are used to produce 
nucleic acid sequences complementary to the nucleic acids on the microarray. If the 
microarray contains cDNAs, antisense RNAs (aRNAs) are appropriate probes. Therefore, 
in one aspect, mRNA is reverse- transcribed to cDNA. The cDNA, in the presence of 

1 5 fluorescent label, is used to produce fragment or oligonucleotide aRNA probes. The 

fluorescently labeled probes are incubated with the microarray so that the probes hybridize 
to the microarray oligonucleotides. Nucleic acid sequences used as probes can include 
polynucleotides, fragments, and complementary or antisense sequences produced using 
restriction enzymes, PCR, or other methods known in the art. 

20 Hybridization conditions can be adjusted so that hybridization occurs with varying 

degrees of complementarity. A scanner can be used to determine the levels and patterns of 
fluorescence after removal of any nonhybridized probes. The degree of complementarity 
and the relative abundance of each oligonucleotide sequence on the microarray can be 
assessed through analysis of the scanned images. A detection system may be used to 

25 measure the absence, presence, or level of hybridization for any of the sequences. (See, 
e.g., Heller, R.A. et al. (1997) Proc. Natl. Acad. Sci. 94:2150-2155.) 

In another embodiment of the invention, nucleic acid sequences encoding SIGP may 
be used to generate hybridization probes useful in mapping the naturally occurring 
genomic sequence. The sequences may be mapped to a particular chromosome, to a 

30 specific region of a chromosome, or to artificial chromosome constructions, e.g., human 
artificial chromosomes (HACs), yeast artificial chromosomes (YACs). bacterial artificial 
chromosomes (BACs), bacterial PI constructions, or single chromosome cDNA libraries. 
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(See, e.g., Price, CM. (1993) Blood Rev. 7:127-134; and Trask, B.J. (1991) Trends Genet. 
7:149-154.) 

Fluorescent in situ hybridization (FISH) may be correlated with other physical 
chromosome mapping techniques and genetic map data. (See, e.g., Heinz-Ulrich, et al. 
5 (1995) in Meyers, R.A. (ed.) Molecular Biology and Biotechnology . VCH Publishers New 
York, NY, pp. 965-968.) Examples of genetic map data can be found in various scientific 
journals or at the Online Mendelian Inheritance in Man (OMIM) site. Correlation between 
the location of the gene encoding SIGP on a physical chromosomal map and a specific 
disorder, or a predisposition to a specific disorder, may help define the region of DNA 

10 associated with that disorder. The nucleotide sequences of the invention may be used to 
detect differences in gene sequences among normal, carrier, and affected individuals. 

j n s i tu hybridization of chromosomal preparations and physical mapping techniques, 
such as linkage analysis using established chromosomal markers, may be used for 
extending genetic maps. Often the placement of a gene on the chromosome of another 

15 mammalian species, such as mouse, may reveal associated markers even if the number or 
arm of a particular human chromosome is not known. New sequences can be assigned to 
chromosomal arms by physical mapping. This provides valuable information to 
investigators searching for disease genes using positional cloning or other gene discovery 
techniques. Once the disease or syndrome has been crudely localized by genetic linkage to 

20 a particular genomic region, e.g., AT to 1 lq22-23, any sequences mapping to that area 
may represent associated or regulatory genes for further investigation. (See, e.g., Gatti, 
R.A. et al. (1988) Nature 336:577-580.) The nucleotide sequence of the subject invention 
may also be used to detect differences in the chromosomal location due to translocation, 
inversion, etc., among normal, carrier, or affected individuals. 

25 In another embodiment of the invention, SIGP, its catalytic or immunogenic 

fragments, or oligopeptides thereof can be used for screening libraries of compounds in 
any of a variety of drug screening techniques. The fragment employed in such screening 
may be free in solution, affixed to a solid support, borne on a cell surface, or located 
intracellular^. The formation of binding complexes between SIGP and the agent being 

30 tested may be measured. 

Another technique for drug screening provides for high throughput screening of 
compounds having suitable binding affinity to the protein of interest. (See, e.g., Geysen, 
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et al. (1984) PCT application WO84/03564.) In this method, large numbers of different 
small test compounds are synthesized on a solid substrate, such as plastic pins or some 
other surface. The test compounds are reacted with SIGP, or fragments thereof, and 
washed. Bound SIGP is then detected by methods well known in the art. Purified SIGP 
can also be coated directly onto plates for use in the aforementioned drug screening 
techniques. Alternatively, non-neutralizing antibodies can be used to capture the peptide 
and immobilize it on a solid support. 

In another embodiment, one may use competitive drug screening assays in which 
neutralizing antibodies capable of binding SIGP specifically compete with a test 
compound for binding SIGP. In this manner, antibodies can be used to detect the presence 
of any peptide which shares one or more antigenic determinants with SIGP. 

In additional embodiments, the nucleotide sequences which encode SIGP may be 
used in any molecular biology techniques that have yet to be developed, provided the new 
techniques rely on properties of nucleotide sequences that are currently known, including, 
but not limited to, such properties as the triplet genetic code and specific base pair 
interactions. 

The examples below are provided to illustrate the subject invention and are not 
included for the purpose of limiting the invention. 

EXAMPLES 

For purposes of example, the preparation and sequencing of the SPLNNOT04 cDN A 
library, from which Incyte Clones 1534876 and 1559131 were isolated, is described. 
Preparation and sequencing of cDNAs in libraries in the LIFESEQ™ database have varied 
over time, and the gradual changes involved use of kits, plasmids, and machinery available 
at the particular time the library was made and analyzed. 

I. SPLNNOT04 cDNA Library Construction 

The SPLNNOT04 cDNA library was constructed from microscopically normal spleen 
tissue obtained from a 2-year-old Hispanic male who died of cerebral anoxia. The 
patient's serologies and past medical history were negative. 

The frozen tissue was homogenized and lysed using a Brinkmann Homogenize: 
Polytron PT-3000 (Brinkmann Instruments, Westbury. NJ) in guanidinium isothiocyanate 
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solution. The lysate was centrifuged over a 5.7 M CsCl cushion using an Beckman SW28 
rotor in a Beckman L8-70M Ultracentrifuge (Beckman Instruments) for 18 hours at 25,000 
rpm at ambient temperature. The RNA was extracted with acid phenol pH 4.0, 
precipitated using 0.3 M sodium acetate and 2.5 volumes of ethanol, resuspended in 
RNAse-free water and DNase treated at 37 °C. The RNA extraction and precipitation were 
repeated as before. The mRNA was then isolated using the Qiagen Oligotex kit (QIAGEN 
Inc., Chatsworth, CA) and used to construct the cDNA library'. 

The mRNA was handled according to the recommended protocols in the Superscript 
plasmid system (Cat. #18248-013, GiBCO-BRL, Gaithersburg, MD). cDNA synthesis was 
initiated with a Notl-oligo d(T) primer. Double-stranded cDNA was blunted, ligated to 
EcoRI adaptors, digested with NotI, fractionated on a Sepharose CL4B column (Cat. 
#275105-01, Pharmacia), and those cDNAs exceeding 400 bp were ligated into the NotI 
and EcoRJ sites of the pINCY 1 vector (Incyte). The plasmid pTNCY 1 was subsequently 
transformed into DH5a™ competent cells (Cat. #18258-012, Gibco-BRL). 



II Isolation and Sequencing of cDNA Clones 

Plasmid cDNA was released from the cells and purified using the REAL Prep 96 
plasmid kit (Catalog #26173, QIAGEN). The recommended protocol was employed 
except for the following changes: 1) the bacteria were cultured in 1 ml of sterile Terrific 
Broth (Catalog #2271 1, Gibco-BRL) with carbenicillin at 25 mg/L and glycerol at 0.4%; 
2 ) after inoculation, the cultures were incubated for 19 hours and at the end of incubation, 
the cells were lysed with 0.3 ml of lysis buffer; and 3) following isopropanol precipitation, 
the plasmid DNA pellet was resuspended in 0. 1 ml of distilled water. After the last step in 
the protocol, samples were transferred to a 96-well block for storage at 4° C 

cDNAs were sequenced according to the method of Sanger et al. ( 1975, J. Mol. Biol. 
94:44 If), using the Perkin Elmer Catalyst 800 or a Hamilton Micro Lab 2200 (Hamilton, 
Reno, NV) in combination with Peltier Thermal Cyclers (PTC200 from MJ Research, 
Watertown, MA) and Applied Biosystems 377 DNA Sequencing Systems or the Perkin 
Elmer 373 DNA Sequencing System and the reading frame was determined. 

III. Homology Searching of cDNA Clones and Their Deduced Proteins 

The nucleotide sequences and'or amino acid sequences of the Sequence Listing were 
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used to query sequences in the GenBank, SwissProt, BLOCKS, and Pima II databases. 
These databases, which contain previously identified and annotated sequences, were 
searched for regions of homology using BLAST (Basic Local Alignment Search Tool). 
(See, e.g., AJtschul, S.F. (1993) J. Mol. Evol 36:290-300; and Altschul et al. (1990) J. 
5 Mol. Biol. 215:403-410.) 

BLAST produced alignments of both nucleotide and amino acid sequences to 
determine sequence similarity. Because of the local nature of the alignments, BLAST was 
especially useful in determining exact matches or in identifying homologs which may be 
of prokaryotic (bacterial) or eukaryotic (animal, fungal, or plant) origin. Other algorithms 
10 could have been used when dealing with primary sequence patterns and secondary 
structure gap penalties. (See, e.g., Smith, T. et al. (1992) Protein Engineering 5:35-51.) 
The sequences disclosed in this application have lengths of at least 49 nucleotides and 
have no more than 12% uncalled bases (where N is recorded rather than A, C, G, or T). 
The BLAST approach searched for matches between a query sequence and a database 
1 5 sequence. BLAST evaluated the statistical significance of any matches found, and 

reported only those matches that satisfy the user-selected threshold of significance. In this 
application, threshold was set at 10" 25 for nucleotides and 10" 8 for peptides. 

Incyte nucleotide sequences were searched against the GenBank databases for primate 
(pri), rodent (rod), and other mammalian sequences (mam), and deduced amino acid 
20 sequences from the same clones were then searched against GenBank functional protein 
databases, mammalian (mamp), vertebrate (vrtp), and eukaryote (eukp), for homology. 

IV. Northern Analysis 

Northern analysis is a laboratory technique used to detect the presence of a transcript 
25 of a gene and involves the hybridization of a labeled nucleotide sequence to a membrane 
on which RNAs from a particular cell type or tissue have been bound. (See, e.g., 
Sambrook, supra, ch. 7; and Ausubel, F.M. et al. supra, ch. 4 and 16.) 

Analogous computer techniques applying BLAST are used to search for identical or 
related molecules in nucleotide databases such as GenBank or LIFESEQ™ database 
30 (Incyte Pharmaceuticals). This analysis is much faster than multiple membrane-based 
hybridizations. In addition, the sensitivity of the computer search can be modified to 
determine whether any particular match is categorized as exact or homologous. 
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The basis of the search is the product score, which is defined as: 
% sequence identity x % maximum BLAST score 
100 

The product score takes into account both the degree of similarity between two sequences 
5 and the length of the sequence match. For example, with a product score of 40, the match 
will be exact within a 1% to 2% error, and, with a product score of 70, the match will be 
exact. Homologous molecules are usually identified by selecting those which show 
product scores between 15 and 40, although lower scores may identify related molecules. 
The results of northern analysis are reported as a list of libraries in which the 
10 transcript encoding SIGP occurs. Abundance and percent abundance are also reported. 
Abundance directly reflects the number of times a particular transcript is represented in a 
cDNA library, and percent abundance is abundance divided by the total number of 
sequences examined in the cDNA library. 

1 5 V. Extension of SIGP Encoding Polynucleotides 

The nucleic acid sequence of one of the polynucleotides of the present invention was 
used to design oligonucleotide primers for extending a partial nucleotide sequence to full 
length. One primer was synthesized to initiate extension of an antisense polynucleotide, 
and the other was synthesized to initiate extension of a sense polynucleotide. Primers 

20 were used to facilitate the extension of the known sequence "outward" generating 

amplicons containing new unknown nucleotide sequence for the region of interest. The 
initial primers were designed from the cDNA using OLIGO 4.06 (National Biosciences, 
Plymouth, MN), or another appropriate program, to be about 22 to 30 nucleotides in 
length, to have a GC content of about 50% or more, and to anneal to the target sequence at 

25 temperatures of about 68 °C to about 72 °C. Any stretch of nucleotides which would result 
in hairpin structures and primer-primer dimerizations was avoided. 

Selected human cDNA libraries (GlBCO/BRL) were used to extend the sequence. If 
more than one extension is necessary or desired, additional sets of primers are designed to 
further extend the known region. 

30 High fidelity amplification was obtained by following the instructions for the XL- 

PCR kit (Perkin Elmer) and thoroughly mixing the enzyme and reaction mix. PCR was 
performed using the Peltier Thermal Cycler (PTC200; M.J. Research, Watertown, MA), 
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beginning with 40 pmol of each primer and the recommended concentrations of all other 
components of the kit, with the following parameters: 

Step i 94° C for 1 min (initial denaturation) 

Step 2 65° C fori min 

5 Step 3 68° C for 6 min 

Step 4 94° C for 15 sec 

Step 5 65° C for 1 min 

Step 6 68° C for 7 min 

S te p 7 Repeat steps 4 through 6 for an additional 15 cycles 

10 Step 8 94° C for 15 sec 

Step 9 65° C for 1 min 

Step 10 68° C for 7:15 min 

Step 1 1 Repeat steps 8 through 10 for an additional 12 cycles 

Step 12 72° C for 8 min 

15 Step 13 4° C (and holding) 

A 5 fA to 10 /J aliquot of the reaction mixture was analyzed by electrophoresis on 
a low concentration (about 0.6% to 0.8%) agarose mini-gel to determine which reactions 
were successful in extending the sequence. Bands thought to contain the largest products 

20 were excised from the gel, purified using QIAQuick™ (QIAGEN Inc., Chatsworth, CA), 
and trimmed of overhangs using Klenow enzyme to facilitate religation and cloning. 

After ethanol precipitation, the products were redissolved in 13 (A of ligation 
buffer, \n\ T4-DNA ligase (15 units) and 1^1 T4 polynucleotide kinase were added, and 
the mixture was incubated at room temperature for 2 to 3 hours, or overnight at 16° C. 

25 Competent E. coli cells (in 40 )A of appropriate media) were transformed with 3 tA of 
ligation mixture and cultured in 80 /u\ of SOC medium. (See, e.g., Sambrook, supra, 
Appendix A, p. 2.) After incubation for one hour at 37° C, the E. coli mixture was plated 
on Luria Bertani (LB) agar (See, e.g., Sambrook, supia, Appendix A, p. 1) containing 2x 
Carb. The following day, several colonies were randomly picked from each plate and 

30 cultured in 150 iA of liquid LB/2x Carb medium placed in an individual well of an 

appropriate commercially-available sterile 96-well microliter plate. The following day, 5 
lA of each overnight culture was transferred into a non-sterile 96-well plate and, after 
dilution 1 : 10 with water, 5 (A from each sample was transferred into a PCR array. 

For PCR amplification, 18 tA of concentrated PCR reaction mix (3.3x) containing 

35 4 units of rTth DNA polymerase, a vector primer, and one or both of the gene specific 
primers used for the extension reaction were added to each well. Amplification was 
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performed using the following conditions: 

Step 1 94° C for 60 sec 

Step 2 94° C for 20 sec 

Step 3 55° C for 30 sec 

5 step 4 72° C for 90 sec 

Step 5 Repeat steps 2 through 4 for an additional 29 cycles 

Step 6 72° C for 180 sec 

Step 7 4 0 C (and holding) 



Aliquots of the PCR reactions were run on agarose gels together with molecular 
weight markers. The sizes of the PCR products were compared to the original partial 
cDNAs, and appropriate clones were selected, ligated into plasmid, and sequenced. 

In like manner, the nucleotide sequence of one of the nucleotide sequences of the 
present invention were used to obtain 5' regulatory sequences using the procedure above, 
oligonucleotides designed for 5' extension, and an appropriate genomic library. 



VI. Labeling and Use of Individual Hybridization Probes 

Hybridization probes derived from one of the nucleotide sequences of the present 
invention are employed to screen cDNAs, genomic DNAs, or mRNAs. Although the 

20 labeling of oligonucleotides, consisting of about 20 base pairs, is specifically described, 
essentially the same procedure is used with larger nucleotide fragments. Oligonucleotides 
are designed using state-of-the-art software such as OLIGO 4.06 (National Biosciences) 
and labeled by combining 50 pmol of each oligomer, 250 fu.C\ of [y- 32 P] adenosine 
triphosphate (Amersham, Chicago, IL), and T4 polynucleotide kinase (DuPont NEN*, 

25 Boston, MA). The labeled oligonucleotides are substantially purified using a Sephadex G- 
25 superfine resin column (Pharmacia & Upjohn, Kalamazoo, MI). An aliquot containing 
10 : counts per minute of the labeled probe is used in a typical membrane-based 
hybridization analysis of human genomic DNA digested with one of the following 
endonucleases: Ase I, Bgl II, Eco RI, Pst I, Xba 1, or Pvu II (DuPont KEN, Boston, MA). 

30 The DNA from each digest is fractionated on a 0.7 percent agarose gel and 

transferred to nylon membranes (Nytran Plus, Schleicher & Schuell, Durham, NH). 
Hybridization is carried out for 16 hours at 40°C. To remove nonspecific signals, blots 
are sequentially washed at room temperature under increasingly stringent conditions up to 
0.1 x saline sodium citrate and 0.5% sodium dodecyl sulfate. After XOMAT AR™ film 



WO 99/33981 PCT/US98/27598 
(Kodak, Rochester, NY) is exposed to the blots to film for several hours, hybridization 
patterns are compared visually. 

VII. Microarrays 

5 To produce oligonucleotides for a microarray, one of the nucleotide sequences of 

the present invention is examined using a computer algorithm which starts at the 3' end of 
the nucleotide sequence. For each, the algorithm identifies oligomers of defined length 
that are unique to the nucleic acid sequence, have a GC content within a range suitable for 
hybridization, and lack secondary structure that would interfere with hybridization. The 

10 algorithm identifies approximately 20 oligonucleotides corresponding to each nucleic acid 
sequence. For each sequence-specific oligonucleotide, a pair of oligonucleotides is 
synthesized in which the first oligonucleotides differs from the second oligonucleotide by 
one nucleotide in the center of the sequence. The oligonucleotide pairs can be arranged on 
a substrate, e.g. a silicon chip, using a light-directed chemical process. (See, e.g., Chee, 

15 supra .") 

In the alternative, a chemical coupling procedure and an ink jet device can be used 
to synthesize oligomers on the surface of a substrate. (See, e.g., Baldeschweiler, sucra.) 
An array analogous to a dot or slot blot may also be used to arrange and link fragments or 
oligonucleotides to the surface of a substrate using or thermal, UV, mechanical, or 

20 chemical bonding procedures, or a vacuum system. A typical array may be produced by 
hand or using available methods and machines and contain any appropriate number of 
elements. After hybridization, nonhybridized probes are removed and a scanner used to 
determine the levels and patterns of fluorescence. The degree of complementarity and the 
relative abundance of each oligonucleotide sequence on the microarray may be assessed 

25 through analysis of the scanned images. 

VIII. Complementary Polynucleotides 

Sequences complementary to the SIGP-encoding sequences, or any parts thereof, 
are used to detect, decrease, or inhibit expression of naturally occurring SIGP Although 
30 use of oligonucleotides comprising from about 15 to 30 base pairs is described, essentially 
the same procedure is used with smaller or with larger sequence fragments. Appropriate 
oligonucleotides are designed using Oligo 4.06 software and the coding sequence of SIGP. 



-63- 
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To inhibit transcription, a complementary' oligonucleotide is designed from the most 
unique 5' sequence and used to prevent promoter binding to the coding sequence. To 
inhibit translation, a complementary oligonucleotide is designed to prevent ribosomal 
binding to the SIGP-encoding transcript. 

5 

IX. Expression of SIGP 

Expression of SIGP is accomplished by subcloning the cDNA into an appropriate 
vector and transforming the vector into host cells. This vector contains an appropriate 
promoter, e.g., 6-galactosidase upstream of the cloning site, operably associated with the 

10 cDNA of interest. (See, e.g.,Sambrook, supra, pp. 404-433; and Rosenberg, M. et al. 
(1983) Methods Enzymol. 101:123-138.) 

Induction of an isolated, transformed bacterial strain with isopropyl beta-D- 
thiogalactopyranoside (IPTG) using standard methods produces a fusion protein which 
consists of the first 8 residues of B-galactosidase, about 5 to 15 residues of linker, and the 

1 5 full length protein. The signal residues direct the secretion of SIGP into bacterial growth 
media which can be used directly in the following assay for activity. 



X. Production of SIGP Specific Antibodies 

SIGP substantially purified using PAGE electrophoresis (see, e.g., Harrington, 
20 M.G. (1990) Methods Enzymol. 182:488-495), or other purification techniques, is used to 
immunize rabbits and to produce antibodies using standard protocols. The SIGP amino 
acid sequence is analyzed using DNASTAR software (DNASTAR Inc) to determine 
regions of high immunogenicity, and a corresponding oligopeptide is synthesized and used 
to raise antibodies by means known to those of skill in the art. Methods for selection of 
25 appropriate epitopes, such as those near the C-terminus or in hydrophilic regions are well 
described in the art. (See, e.g., Ausubel et al. supra, ch. 11.) 

Typically, the oligopeptides are 15 residues in length, and are synthesized using 
an Applied Biosystems Peptide Synthesizer Model 431 A using fmoc-chemistry and 
coupled to KLH (Sigma, St. Louis, MO) by reaction with N-maleimidobenzoyl-N- 
30 hydroxysuccinimide ester (MBS) to increase immunogenicity. (See, e.g., Ausubel et al. 
supra.) Rabbits are immunized with the oligopeptide-KLH complex in complete Freund's 
adjuvant. Resulting antisera are tested for antipeptide activity, for example, by binding the 
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peptide to plastic, blocking with 1% BSA, reacting with rabbit antisera, washing, and 
reacting with radio-iodinated goat anti-rabbit IgG. 



XI. Purification of Naturally Occurring SIGP Using Specific Antibodies 

5 Naturally occurring or recombinant SIGP is substantially purified by 

immunoaffinity chromatography using antibodies specific for SIGP. An immunoaffinity 
column is constructed by covalently coupling anti-SlGP antibody to an activated 
chromatographic resin, such as CNBr-activated Sepharose (Pharmacia & Upjohn). After 
the coupling, the resin is blocked and washed according to the manufacturer's instructions. 

10 Media containing SIGP are passed over the immunoaffinity column, and the 

column is washed under conditions that allow the preferential absorbance of SIGP (e.g., 
high ionic strength buffers in the presence of detergent). The column is eluted under 
conditions that disrupt antibody/SIGP binding (e.g., a buffer of pH 2 to pH 3, or a high 
concentration of a chaotrope, such as urea or thiocyanate ion), and SIGP is collected. 

15 

XII. Identification of Molecules Which Interact with SIGP 

SIGP, or biologically active fragments thereof, are labeled with ,25 I Bolton-Hunter 
reagent. (See, e.g., Bolton et al. (1973) Biochem. J. 133:529.) Candidate molecules 
previously arrayed in the wells of a multi-well plate are incubated with the labeled SIGP, 

20 washed, and any wells with labeled SIGP complex are assayed. Data obtained using 

different concentrations of SIGP are used to calculate values for the number, affinity, and 
association of SIGP with the candidate molecules. 

Various modifications and variations of the described methods and systems of the 
invention wall be apparent to those skilled in the art without departing from the scope and 

25 spirit of the invention. Although the invention has been described in connection with 
specific preferred embodiments, it should be understood that the invention as claimed 
should not be unduly limited to such specific embodiments. Indeed, various modifications 
of the described modes for canying out the invention which are obvious to those skilled in 
molecular biology or related fields are intended to be within the scope of the following 

30 claims. 
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What is claimed is: 

I A substantially purified human signal peptide-containing protein (SIGP) 
comprising a polypeptide having an amino acid sequence selected from the group 
consisting of SEQ ID NO: 1 , SEQ ID NO:2, SEQ ID NO:3, SEQ ID NO:4, SEQ ID NO:5, 
5 SEQ ID NO:6, SEQ ID NO:7, SEQ ID NO:8, SEQ ID NO:9, SEQ ID NO:10 r SEQ ID 
NO: 1 1 , SEQ ID NO: 1 2, SEQ ID NO: 1 3, SEQ ID NO: 1 4, and SEQ ID NO: 1 5 . 

2. An isolated and purified polynucleotide which hybridizes under stringent 
conditions to the polynucleotide encoding an SIGP of claim 1. 

3. An isolated and purified polynucleotide encoding the SIGP of claim 1. 
10 4. A microarray containing at least a fragment of at least one of the 

polynucleotides encoding an SIGP of claim 1. 

5. An isolated and purified polynucleotide variant having at least 90% 
polynucleotide identity to the polynucleotide of claim 3. 

6. A composition comprising the polynucleotide of claim 3. 

15 7 An isolated and purified polynucleotide which hybridizes under stringent 

conditions to the polynucleotide of claim 3. 

8. An isolated and purified polynucleotide which is complementary to the 

polynucleotide of claim 3. 

9. An isolated and purified polynucleotide having a nucleic acid sequence 
20 selected from the group consisting of SEQ ID NO: 16, SEQ ID NO: 17, SEQ ID NO:18, 

SEQ ID NO: 19, SEQ ID NO:20, SEQ ID NO:21, SEQ ID NO:22, SEQ ID NO:23, SEQ 
ID NO:24, SEQ ID NO:25, SEQ ID NO:26, SEQ ID NO:27, SEQ ID NO:28, SEQ ID 
NO:29,and SEQ IDNO:30. 

10. An isolated and purified polynucleotide variant having at least 90% 
25 polynucleotide identity to the polynucleotide of claim 9. 

11. An isolated and purified polynucleotide which is complementary to the 
polynucleotide sequence of claim 9. 

12. An expression vector containing at least a fragment of the polynucleotide of 

claim 3. 

30 13. A host cell containing the expression vector of claim 12. 

14. A method for producing a polypeptide encoding a human signal peptide- 
containing protein, the method comprising the steps of: 
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(a) culturing the host cell of claim 13 under conditions suitable for the 
expression of the polypeptide; and 

(b) recovering the polypeptide from the host cell culture. 

15. A pharmaceutical composition comprising the SIGP of claim 1 in 
5 conjunction with a suitable pharmaceutical carrier. 

16. A purified antibody which specifically binds to the SIGP of claim 1 

17. A purified agonist of the SIGP of claim 1. 

1 8. A purified antagonist of the SIGP of claim 1. 

1 9. A method for treating or preventing a cancer, the method comprising 
10 administering to a subject in need of such treatment an effective amount of the 

pharmaceutical composition of claim 15. 

20. A method for treating or preventing a cancer, the method comprising 
administering to a subject in need of such treatment an effective amount of the antagonist 
of claim 1 8. 

j 5 21. A method for treating or preventing an immune response, the method 

comprising administering to a subject in need of such treatment an effective amount of the 

antagonist of claim 18. 

22. A method for detecting a polynucleotide encoding a human signal peptide- 
containing protein in a biological sample containing nucleic acids, the method comprising 

20 the steps of: 

(a) hybridizing the polynucleotide of claim 8 to at least one of the 
nucleic acids of the biological sample, thereby forming a hybridization complex; 
and 

(b) detecting the hybridization complex, wherein the presence of the 
25 hybridization complex correlates with the presence of a polynucleotide encoding 

SIGP in the biological sample. 

23. The method of claim 22 wherein the nucleic acids of the biological sample 
are amplified by the polymerase chain reaction prior to the hybridizing step. 
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Pro 


Lys 


Thr 


Leu 


Leu 


Ala 


Phe 


Glu 


Asp 


Met 


Leu 


Glu 


Asn 










365 










370 










375 


Pro 


Leu 


Asn 


Ser 


Thr 


Gin 


Trp 


Met 


Asn Asp 


Pro 


Glu 


Thr 


Gly 


Pro 










3S0 










385 










390 


Val 


Met 


Leu 


Gin 


He 


Ser 


Arg 


He 


Phe 


Gin 


Thr 


Leu 


Asn 


Arg 


Thr 










395 










400 










405 



<210> 9 
<21i> 177 



7,74 
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<212> PRT 

<213> Homo sapiens 
<220> 

<221> unsure 

<222> 170, 171 

<223> unknown, or other 

<220> - 

<223> 1871375 

<400> 9 



Met Val 


Met 


His Asn Ser Asp Pro Asn 


Leu 


His 


Leu 


Leu 


Ala Glu 


1 




5 


10 








15 


Gly Ala 


Pro 


lie Asp Trp Gly Glu Glu Tyr 


Ser 


Asn 


Ser 


Gly Gly 






20 


25 








30 


Gly Gly 


Ser 


Pro Ala Pro Ala Pro Arg 


Ser 


Gin 


Pro 


Pro 


Ser Arg 






35 


40 








45 


Lys Ser Asp Gly Ala Pro Ser Arg Trp 


Ser 


Leu 


Trp 


Ser Arg Met 






50 


55 








60 


Arg Arg 


Trp 


Gly Cys Pro Leu Arg Leu 


Ala 


Leu 


Ser 


His 


His His 






65 


70 








75 


Leu Arg 


Pro 


Arg Thr Val Ser Leu Arg 


Ser 


Glu 


Ala 


Cys 


Trp Pro 






80 


85 








90 


Lys Val 


Cys Gly Leu Arg Ala Pro His 


Gin 


Pro 


Ala 


Pro 


Cys Ser 






95 


100 








105 


Thr Gly 


Pro 


Pro Leu Gly Arg Val Pro 


Ser 


Leu 


Arg 


Pro 


Pro Pro 






110 


115 








120 


Arg Pro 


Pro 


Arg Arg Leu Pro His Pro 


Ser 


Ser 


He 


Ser 


Cys Leu 






125 


130 








135 


Glu Arg 


Leu 


Trp Thr Leu Gly Pro Pro 


Ser 


Pro Ala Thr Arg Arg 






14 0 


14 5 








150 


Leu Glu 


Ser 


Arg Cys Pro Ala Pro Ala 


Ala 


Thr 


Pro 


Pro 


Ser Thr 






155 


160 








165 


Pro Pro 


Pro 


Arg Xaa Xaa Phe Lys Gly 


Cys 


Lys 


Asn 










170 


175 











<210> 10 

<211> 197 

<212> PRT 

<213> Homo sapiens 

<22C> - 

<223> 1880830 

<40C> 10 

Met He Thr Cys Arg Val Cys Gin Ser Leu He Asn Val Glu Gly 

15 10 15 

Lys Met His Gin His Val Val Lys Cys Gly Val Cys Asn Glu Ala 

20 25 30 

Thr Pro He Lys Asn Ala Pro Pro Gly Lys Lys Tyr Val Arg Cys 

35 40 45 

Pro Cys Asn Cys Leu Leu He Cys Lys Val Thr Ser Gin Arg lie 

50 55 6C 

Ala Cys Pro Arg Pro Tyr Cys Lys Arg He lie Asn Leu Gly Pro 
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65 7C 75 

Val His Pre Gly Pre Leu Ser Pro Glu Pro Gin Pro Met Gly Val 

80 85 90 
Arg Val lie Cys Gly His Cys Lys Asn Thr Phe Leu Trp Thr Glu 

95 100 105 

Phe Thr Asp Arg Thr Leu Ala Arg Cys Pro His Cys Arg Lys Val 

110 H5 120 

Ser Ser lie Gly Arg Arg Tyr Pro Arg Lys Arg Cys He Cys Cys 

125 130 135 

Phe Leu Leu Gly Leu Leu Leu Ala Val Thr Ala Thr Gly Leu Ala 

140 145 150 

Phe Gly Thr Trp Lys His Ala Arg Arg Tyr Gly Gly He Tyr Ala 

155 160 165 

Ala Trp Ala Phe Val He Leu Leu Ala Val Leu Cys Leu Gly Arg 

170 175 180 

Ala Leu Tyr Trp Ala Cys Met Lys Val Ser His Pro Val Gin Asn 

185 190 195 

Phe Ser 



<210> 11 

<211> 346 

<212> PRT 

<213> Homo sapiens 

<220> - 

<223> 2328134 

<:400> 11 



Met 


Thr 


Pro 


Arg 


Thr 


Trp 


Trp 


Pro 


Arg 


Pro 


Ala 


Gly 


Trp 


Gly 


Thr 


1 








5 










10 










15 


Cys 


Arg 


Ala 


Ala 


Gly Trp 


Pro 


Arg 


Ser 


Val 


Pro 


Trp 


Ala 


Arg 


Thr 








20 










25 










30 


Ala 


Ala 


Ser 


Leu 


Val 


Phe 


Val 


Pro 


Thr 


Arg 


Arg 


Arg 


Ser 


Gly 


Pro 










35 










40 










45 


Ser 


Gly 


Thr 


Ala 


Ser 


Val 


Ala 


Ala 


Met 


Ala 


Tyr 


His 


Ser 


Gly Tyr 








50 










55 










60 


Gly 


Ala 


His 


Gly 


Ser 


Lys 


His 


Arg 


Ala Arg Ala 


Ala 


Pro 


Asp 


Pro 








65 










70 










75 


Pro 


Pro 


Leu 


Phe 


Asp 


Asp 


Thr 


Ser Gly Gly Tyr 


Ser 


Ser 


Gin 


Pro 










80 










85 










90 


Gly 


Gly 


Tyr 


Pro 


Ala 


Thr 


Gly Ala 


Asp 


Val 


Ala 


Phe 


Ser 


Val 


Asn 






95 










100 










105 


His 


Leu 


Leu 


Gly 


Asp 


Pro 


Met 


Ala 


Asn 


Val 


Ala 


Met 


Ala 


Tyr 


Gly 










110 










115 










120 


Ser 


Ser 


He 


Ala 


Ser 


His 


Gly 


Lys 


Asp 


Met 


Val 


His 


Lys 


Glu 


Leu 










125 










130 










135 


His 


Arg 


Phe 


Val 


Ser 


Val 


Ser 


Lys 


Leu 


Lys 


Tyr 


Phe 


Phe 


Ala 


Val 








140 










145 










150 


Asp 


Thr 


Ala 


Tyr 


val 


Ala 


Lys 


Lys 


Leu 


Gly Leu 


Leu 


Val 


Phe 


Pro 








155 










160 










165 


Tyr 


Thr 


His 


Gin 


Asn 


Trp 


Glu 


Val 


Gin 


Tyr 


Ser 


Arg 


Asp 


Ala 


Pre 








170 










175 










180 


Leu 


Pro 


Pre 


Arg 


Gin 


Asp 


Leu 


Asn 


Ala 


Pro 


Asp 


Leu 


Tyr 


He 


Pre 










185 










190 










195 


Thr 


Met 


Ala 


Phe 


He 


Thr 


Tyr 


Val 


Leu 


Leu 


Ala 


Gly 


Xet 


Ala 


leu 
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200 








205 










210 


Gly 


He 


Gin 


Lys 


Arg 


Phe 


Ser 


Pl"C 


Glu Val 


Leu 


Glv 


Leu 


Cvs 


Ala 








215 








220 










225 


Ser 


Thr 


Ala 


Leu 


Val 
230 




Val 


Val 


Met Glu 
235 


Val 


Leu 


Ala 


Leu 


Leu 
240 


Leu 


Gly 


Leu 


Tyr 


Leu 
245 


Ala 


Thr 


Val 


Arg Ser 
250 


Asp 


Leu 


Ser 


Thr 


Phe 
255 


His 


Leu 


Leu 


Ala 


Tyr 
260 


Ser 


Gly 


Tyr 


Lys Tyr 
265 


Val 


Gly 


Met 


He 


Leu 
270 


Ser 


val 


Leu 


Thr Gly 


Leu 


Leu 


Phe 


Gly Ser Asp 


Gly 


Tyr 


Tyr 


Val 










275 








280 










285 


Ala 


Leu 


Ala 


Trp 


Thr 
290 


Ser 


Ser 


Ala 


Leu Met 
295 


Tyr 


Phe 


He 


Val 


Arg 
300 


Ser 


Leu 


Arg 


Thr 


Ala 


Ala 


Leu Gly Pro Asp 


Ser 


Met 


Gly 


Gly 


Pro 








305 








310 










315 


Val 


Pro 


Arg 


Gin 


Arg 


Leu 


Gin 


Leu 


Tyr Leu 


Thr 


Leu Gly Ala 


Ala 








320 








325 










330 


Ala 


Phe 


Gin 


Pro 


Leu 
335 


He 


He 


Tyr 


Trp Leu 
340 


Thr 


Phe 


His 


Leu 


Val 
345 



Arg 



<210> 12 
<211> 256 
<212> PRT 

<213> Homo sapiens 

<220> - 

<223> 2652271 

<400> 12 



Met 


Arg 


Pro 


Ala 


Ala 


Leu Arg 


Gly Ala 


Leu 


Leu 


Gly Cys 


Leu 


Cys 


l 






5 










10 










15 


Leu 


Ala 


Leu 


Leu 


Cys 


Leu 


Gly 


Gly 


Ala 


Asp 


Lys 


Arg 


Leu 


Arg 


Asp 










20 










25 










30 


Asn 


His 


Glu 


Trp 


Lys 


Lys 


Leu 


He 


Met 


Val 


Gin 


His 


Trp 


Pro 


Glu 










35 










40 










45 


Thr 


val 


Cys 


Glu 


Lys 


He 


Gin 


Asn 


Asp 


Cys 


Arg 


Asp 


Pro 


Pro 


Asp 








50 










55 










60 


Tyr 


Trp 


Thr 


He 


His 


Gly 


Leu 


Trp 


Pro Asp 


Lys 


Ser 


Glu 


Gly 


Cys 






65 










70 










75 


Asn 


Arg 


Ser 


Trp 


Pro 


Phe 


Asn 


Leu 


Glu 


Glu 


He 


Lys 


Asp 


Leu 


Leu 








80 










85 










90 


Pro 


Glu 


Met 


Arg 


Ala 


Tyr 


Trp 


Pro 


Asp 


val 


He 


His 


Ser 


Phe 


Pro 










95 










100 










105 


Asn 


Arg 


Ser 


Arg 


Phe 


Trp 


Lys 


His 


Glu 


Trp 


Glu 


Lys 


His 


Gly Thr 








110 










115 










12C 


Cys 


Ala 


Ala 


Gin 


Val 


Asp 


Ala 


Leu 


Asn 


Ser 


Gin 


Lys 


Lys 


Tyr 


Phe 








125 










130 










135 


Glv Arq 


Ser 


Leu 


Glu 


Leu 


Tyr 


Arg 


Glu 


Leu 


Asp 


Leu 


Asn 


Ser 


Val 










140 










145 










150 


Leu 


Leu 


Lys 


Leu 


Gly 


He 


Lys 


Pro 


Ser 


He 


Asn 


Tyr 


Tyr 


Gin 


Val 










155 










160 










165 


Ala 


Asp 


Phe 


Lys 


Asp 


Ala 


Leu 


Ala 


Arg 


Val 


Tyr 


Gly 


Val 


lie 


Pro 
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Lys 


lie u^n 


Cys 


Leu 


Pro Pro 


Ser 


Gin 


Asp 


Glu Glu 


Val 


Gin 


Thr 






185 








190 








195 


He 


Gly Gin 


He 


Glu 


Leu Cys 


Leu 


Thr 


Lys 


Gin Asp 


Gin 


Gin 


Leu 






200 








205 








210 


Gin 


Asn Cys 


Thr 


Glu 


Pro Gly 


Glu 


Gin 


Pro 


Ser Pro 


Lys 


Gin 


Glu 






215 








220 








225 


val 


Trp Leu 


Ala 


Asn 


Gly Ala 


Ala 


Glu 


Ser 


Arg Gly 


Leu 


Arg 


Val 






230 








235 








240 


Cys 


Glu Asp 


Gly 


Pro 


Val Phe 


Tyr 


Pro 


Pro 


Pro Lys 


Lys 


Thr 


Lys 




245 








250 








255 



His 



<210> 13 
<211> 235 
<212> PRT 

<213> Homo sapiens 

<220> - 

<223> 2965248 

<400> 13 



Met 


Ala 


Ser 


Thr 


He 


Ser 


Ala 


Tyr 


Lys 


ulU 


Lys 




Lys 


Glu 


Leu 


l 








5 










10 










15 


Ser 


Val 


Leu 


Ser 


Leu 
20 


He 


Cys 


Ser 


Cys 


Phe 
25 


Tyr 


Thr 


Gin 


Pro 


His 
30 


Pro 


Asn 


Thr 


Val 


Tyr 


Gin 


Tyr 


Gly Asp 


Met 


Glu 


Val 


Lys 


Gin 


Leu 










35 










40 










45 


Asp 


Lys 


Arg 


Ala 


Ser 


Gly Gin 


Ser 


Phe 


Glu 


Val 


He 


Leu 


Lys 


Ser 






50 










55 










60 


Pro 


Ser 


Asp 


Leu 


Ser 


Pro 


Glu 


Ser 


Pro 


Met 


Leu 


Ser 


Ser 


Pro 


Pro 








65 










70 










75 


Lys 


Lys 


Lys 


Asp 


Thr 


Ser 


Leu 


Glu 


Glu 


Leu 


Gin 


Lys 


Arg 


Leu 


Glu 






80 










85 










90 


Ala 


Ala 


Glu 


Glu 


Arg 
95 


Arg 


Lys 


Thr 


Gin 


Glu 
100 


Ala 


Gin 


Val 


Leu 


Lys 
105 


Gin 


Leu 


Ala 


Asp 


Gly 

110 


Ala 


Ser 


Thr 


Ser 


Ala 
115 


Arg 


Cys 


Cys 


Thr 


Arg 
120 


Arg 


Trp 


Arg 


Arg 


He 


Thr 


Thr 


Ser 


Ala 


Ala 


Arg 


Arg 


Arg 


Arg 


Ser 






125 










130 










135 


Ser 


Thr 


Thr 


Arg 


Trp 


Ser 


Ser 


Ala 


Arg 


Arg 


Ser 


Ala 


Arg 


His 


Thr 








140 










145 










150 


Trp 


Pro 


His 


Cys 


Ala 


Ser 


Gly 


Cys 


Ala 


Arg 


Arg 


Ser 


Cys 


Thr 


Arg 








155 










160 










165 


Pro 


Arg 


Cys 


Ala 


Gly 


Thr 


Arg 


Ser 


Ser 


Glu 


Lys 


Arg 


Cys 


Arg 


Ala 






170 










175 










180 


Lys 


Gly 


Pro 


Gly 


Arg 


Ala 


Ala 


Pro 


He 


Leu 


Arg 


Arg 


Asn 


Thr 


Phe 






185 










190 










195 


Gly 


Phe 


Trp 


Phe 


Cys 


Phe 


Val 


His 


Leu 


Cys 


Leu 


Asp 


Ala 


Thr 


Phe 






200 










205 










210 


Val 


Pro 


Pro 


Pro 


Pro 
215 


Pro 


Gin 


Pre 


Pro 


Ala 

220 


Ser 


Cys 


Phe 


Ser 


Ser 
225 


Ala 


Leu 


Ser 


Arg 


Pro 

230 


Ala 


Leu 


Ser 


Ser 


Trp 
235 
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<210> 14 

<211> 371 

<212> PRT 

<213> Homo sapiens 



<220> - 

<223> 3057669 

<400> 14 



Met 


Asp 


His 


Glu 


Asp 


He 


Ser 


Glu 


Ser 


Val 


Asp 


Ala 


Ala 


Tyr 


Asn 








5 










10 










15 


Leu 


Gin 


Asp 


Ser 


Cys 


Leu 


Thr Asp 


Cys 


Asp 


Val 


Glu Asp 


Gly Thr 








20 










25 










30 


Met 


Asp 


Gly Asn Asp 


Glu 


Gly 


His 


Ser 


Phe 


Glu 


Leu 


Cys 


Pro 


Ser 








35 










40 










45 


Glu 


Ala 


Ser 


Pro 


Tyr 
50 


Val 


Arg 


Ser 


Arg 


Glu 
55 


Arg 


Thr 


Ser 


Ser 


Ser 
60 


He 


Val 


Phe 


Glu 


Asp 

65 


Ser 


Gly 


Cys 


Asp 


Asn 
70 


Ala 


Ser 


Ser 


Lys 


Glu 
75 


Glu 


Pro 


Lys 


Thr 


Asn 


Arg 


Leu 


Kis 


He 


Gly Asn 


His 


Cys 


Ala 


Asn 








80 










85 










90 


Lys 


Leu 


Thr 


Ala 


Phe 


Lys 


Pro 


Thr 


Ser 


Ser 


Lys 


Ser 


Ser 


Ser 


Glu 








95 










100 










105 


Ala 


Thr 


Leu 


Ser 


lie 
110 


Ser 


Pro 


Pro 


Arg 


Pro 
115 


Thr 


Thr 


Leu 


Ser 


Leu 
120 



Asp 


Leu 


Thr 


Lys 


Asn 


Thr 


Thr 


Glu 


Lys 


Leu 


Gin 


Pro 


Ser 


Ser 


Pro 








125 










130 










135 


Lys 


Val 


Tyr 


Leu 


Tyr 


He 


Gin 


Met 


Gin 


Leu 


Cys 


Arg 


Lys 


Glu 


Asn 






140 










145 










150 


Leu 


Lys 


Asp 


Trp 


Met 


Asn 


Gly Arg 


Cys 


Thr 


He 


Glu 


Glu 


Arg 


Glu 








155 










160 










165 


Arg 


Ser 


Val 


Cys 


Leu 


His 


He 


Phe 


Leu 


Gin 


He 


Ala 


Glu 


Ala 


Val 








170 










175 










180 


Glu 


Phe 


Leu 


His 


Ser 


Lys 


Gly 


Leu 


Met 


His 


Arg 


Asp 


Leu 


Lys 


Pro 










185 










190 










195 


Ser 


Asn 


He 


Phe 


Phe 


Thr 


Met 


Asp 


Asp 


Val 


Val 


Lys 


Val 


Gly 


Asp 










200 










205 










210 


Phe 


Gly 


Leu 


Val 


Thr 


Ala 


Met 


Asp 


Gin 


Asp 


Glu 


Glu 


Glu 


Gin 


Thr 








215 










220 










225 


Val 


Leu 


Thr 


Pro 


Met 


Pro 


Ala 


Tyr 


Ala 


Arg 


His 


Thr Gly 


Gin 


Val 










230 










23 5 










240 


Gly Thr 


Lys 


Leu 


Tyr 


Met 


Ser 


Pro 


Glu 


Gin 


He 


His 


Gly Asn 


Ser 








245 










250 










255 


Tyr 


Ser 


His 


Lys 


Val 


Asp 


He 


Phe 


Ser 


Leu Gly Leu 


lie 


Leu 


Phe 








260 










265 










270 


Glu 


Leu 


Leu 


Tyr 


Pro 


Phe 


Ser 


Thr 


Gin 


Met 


Glu 


Arg 


val 


Arg 


Thr 








275 










280 










285 


Leu 


Thr 


Asp 


Vai 


Arg 


Asn 


Leu 


Lys 


Phe 


Pro 


Pro 


Leu 


Phe 


Thr 


Gin 










290 










295 










300 


Lys 


Tyr 


Fro 


Cys 


Glu 


Tyr 


Val 


Met 


Val 


Gin Asp 


Met 


Leu 


Ser 


Pro 








305 










310 










315 


Ser 


Pro 


Met 


Glu 


Arg 


Pro 


Glu 


Ala 


He 


Asn 


He 


He 


Glu 


Asn 


Ala 










320 










325 










330 


Val 


Phe 


Glu 


Asp 


Leu 


Asp 


Phe 


Pro 


Gly 


Lys 


Thr 


Val 


Leu 


Arg 


Gin 
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Arg Ser Arg Ser Leu Ser Ser Ser Gly 
3 50 

Ser Asn Asn Ser His Ser Pro Leu Pro 
365 
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Thr Lys His Ser Arg Gin 
355 360 
Ser Asn 
370 



<210> 15 

<211> 523 

<212> PRT 

<213> Homo sapiens 

<220> - 

<223> 3125156 

<400> 15 



Met 


Gly 


Pro 


Gin 


Ala 


Ala 


Pro 


Leu 


Thr 


He 


Arg Gly 


Pro 


Ser 


Ser 


1 






5 










10 








-L 3 


Ala 


Gly 


Gin 


Ser 


Thr 


Pro 


Ser 


Pro 


His 


Leu 


Val Pro 


Ser 


Pro 


Ala 








20 










25 








30 


Pro 


Ser 


Pro 


Gly 


Pro 


Gly 


Pro 


Val 


Pro 


Pro 


Arg Pro 


Pro 


Ala 


Ala 










35 










40 








45 


Glu 


Pro 


Pro 


Pro 


Cys 


Leu 


Arg 


Arg 


Gly Ala Ala Ala Ala Asp 


Leu 










50 










55 








60 


Leu 


Ser 


Ser 


Ser 


Pro 


Glu 


Ser 


Gin 


His 


Gly Gly Thr 


oin 


Ser 


Pro 










65 










70 








75 


Gly Gly Gly 


Gin 


Pro 


Leu 


Leu 


Gin 


Pro 


Thr 


Lys Val 


Asp 


Ala 


Ala 










80 










85 








90 


Glu 


Gly Arg Arg 


Pro 


Gin 


Ala 


Leu 


Arg 


Leu 


He Glu 


Arg 


Asp 


Pro 










95 










100 








105 


Tyr 


Glu 


His 


Pro 


Glu 


Arg 


Leu 


Arg 


Gin 


Leu 


Gin Gin 


Glu 


Leu 


Glu 








110 










115 








120 


Ala Phe Arg Gly Gin Leu Gly Asp 


Val 


Gly Ala Leu 


Asp 


Thr 


Val 










125 










130 








135 


Trp 


Arg 


Glu 


Leu Gin Asp Ala 


Gin 


Glu 


His 


Asp Ala 


Arg 


Gly 


Arg 






140 










145 








150 


Ser 


lie 


Ala 


lie 


Ala 


Arg 


Cys 


Tyr 


Ser 


Leu 


Lys Asn 


Arg 


His 


Gin 










155 










160 








165 


Asp 


Val 


Met 


Pro 


Tyr 


Asp 


Ser 


Asn 


Arg 


Val 


Val Leu 


Arg 


Ser 


Gly 








170 










175 








180 


Lys 


Asp 


Asp 


Tyr 


He 


Asn 


Ala 


Ser 


Cys 


Val 


Glu Gly 


Leu 


Ser 


Pro 








185 










190 








195 


Tyr 


Cys 


Pro 


Pro 


Leu 


Val 


Ala 


Thr 


Gin 


Ala 


Pro Leu 


Pro 


Gly Thr 






200 










205 








210 


Ala 


Ala 


Asp 


Phe 


Trp 


Leu 


Met 


Val 


His 


Glu 


Gin Lys 


Val 


Ser 


Val 










215 










220 








225 


lie 


Val 


Met 


Leu 


val 


Ser 


Glu 


Ala 


Glu 


Met 


Glu Lys 


Gin 


Lys 


Val 










230 










235 








24 0 


Ala 


Arg 


Tyr 


Phe 


Pro 


Thr 


Glu 


Arg 


Gly 


Gin 


Pro Met 


val 


His 


Gly 








245 










250 








255 


Ala 


Leu 


Ser 


Leu 


Ala 


Leu 


Ser 


Ser 


Val 


Arg 


Ser Thr 


Glu 


Thr 


His 










260 










265 








273 


Val 


Glu 


Arg 


Val 


Leu 


Ser 


Leu 


Gin 


Phe 


Arg 


Asp Gin 


Ser 


Leu 


Lys 










275 










280 








285 


Arg 


Ser 


Leu 


Val 


His 


Leu 


His 


Phe 


Pro 


Thr 


Trp Pro 


Glu 


Leu 


Gly 










290 










295 








30C 
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Leu Pro 


Asp 


Ser 


Pro 


ofcrX 


Asn 


Leu 


Leu Arg 


Phe 


lie Gin 


Glu 


Val 








J v J 










310 








315 


His Ala 


His 


Tyr 


Leu 


nib 


1 n 


Arg 


Pro 


Leu 


His 


Thr Pro 


lie 


lie 








J 4. \J 










32 5 








330 


val His 


Cys 


Ser 


Oa-r- 
DCl 


(~> 1 -w- 

ui y 


v a x 


Gly Arg 


Thr 


Gly Ala Phe 


Ala 


Leu 






J J -J 










3 4 0 








345 


Leu Tyr Ala Ala 


vax 


pi n 

J. XI 




Val 


Glu 


Ala 


Gly Asn Gly 


lie 


Pro 








3 50 










3 55 








360 


Glu Leu 


Pro 


Gin 


Leu 


vdl 






Met 




Gin 


Gin Arg 


Lys 


His 








1 £> ^ 










370 








375 


Met Leu 


Gin 


Glu 


Lys 


T .01 1 
Leu 


His 


Leu 


Arg 


Phe 


Cys 


Tyr Glu 


Ala 


Val 








3 80 










3 8 5 








390 


Val Arg 


His 


Val 




n~\ n 

J. 11 


V« "1 


Leu 


Gin 


Arg 


His 


Gly Val 


Pro 


Pro 






J -7 J 










4 00 








405 


Pro Cys 


Lys 


Pro 


Leu 




ser 


i-vx ct 


ser 


lie 


Ser 


Gin Lys 


Asn 


His 














A1 R 
'ii- J 








42 0 


Leu Pro 


Gin 


Asp 


Ser 


\j±n 


Asp 


Leu 


Val 


Leu 


Gly Gly Asp Val 


Pro 








a. *> ^ 

H Z. D 










a n 

T J V 








43 5 


lie Ser 


Ser 


He 


pi _ 

Gin 


Ala 

ax a 


I XIX 


He 


Ala 


Lys 


Leu 


Ser lie 


Arg 


Pro 








AAA 










4 A R 








45 0 


Pro Gly Gly Leu 


Glu 


faer 


Pro 


Val 


Ala 


Ser 


Leu 


Pro Gly 


Pro 


Ala 








455 










460 








465 


Glu Pro 


Pro 


Gly 


Leu 


Pro 


Pro 


Ala 


Ser 


Leu 


Pro 


Glu Ser 


Thr 


Pro 








470 










475 








480 


lie Pro 


Ser 


Ser 


Ser 


Gin 


Thr 


Pro 


Phe 


Pro 


Pro 


His Tyr 


Leu 


Arg 








485 










490 








495 


Leu Pro 


Ser 


Leu 


Arg 


Arg 


Ser 


Arg 


Gin 


Cys 


Leu 


Lys Pro 


Pro 


Ala 








500 










505 








510 


Arg Gly 


Pro 


Pro 


Pro 


Pro 


Pro 


Trp 


Asn 


Cys 


Trp 


Pro Pro 










515 










520 











<210> 16 
<211> 846 
<212> DNA 

<213> Homo sapiens 

<220> - 
<223> 866885 

<400> 16 

ggcgggcgga gtctgcagga tggcaccgga cccctggttc cccacatacg attctacttg 60 
teaaattgee caagaaattg ctgagaaaat tcaacaacga aatcaatatg aaagaaaagg 12 0 
tgaaaaggca ccaaagctta ccgtgacaat cagagctttg ttgeagaace tgaaggaaaa 180 
gatcgccctt ttgaaggact tattgetaag agctgtgtca acacatcaga taacacagcc 
tgaaggggac cgaagacaga acctcttgga tgatcttgta actcgagaga gactacttct 
ggcatccttt aagaatgagg gtgccgaacc agatctaatc aggtccagcc tgatgagtga 360 
agaggctaag cgaggagcac ccaacccctg gctctttgag gagecagagg agaccagagg 42 0 
cntgggtittt catgaaatcc ggcaacagca gcagaaaatt atccaagaac aggatgeagg 480 
ccttgatgcc ctttcctcta tcataagtcg ccaaaaacaa atggggcagg aaattgggaa 54 0 
tgaattggar gaacaaaatg agataattga cgaccttgcc aacctagtgg agaacacaga 600 
tgaaaaactt cgcaatgaaa ecaggegggt aaacatggtg gacagaaagt cagcctct:g 660 
tgggatgatc atggtgattt tactgetget tgeggctate gnggttgttg cagtctggcc 72 0 
gaccaactga tggcagtaaa gagaccacca gcagtgacac ctggcaatga cagatgeaag 78 0 
cccaacaccc ttttggtacg caaaacctgc tctcaataaa ttcccccaaa gctctgaaaa 84 Z 
aaaaaa 846 



240 
300 
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<210> 17 

<211> 1897 

<212> DNA 

<213> Homo sapiens 

<220> - 
<223> 1273453 

<400> 17 

tgcacatcta gcacaaattg aagatgatag agctgcgatg gttatttctt ggcatctggc 6 0 
aagtgacatg gactgtgtag tcaccctaac cactgacgct gcacgtcgta tctatgatga 120 
aacccaaggt cgtcagcagg tgttgcccct tgattctatt: tacaagaaga ctcttccaga 180 
ttggaaaaga tctctacctc atttccgaaa tggaaaattg tattttaaac ccattggaga 24 0 
tccagtcttt gctcgagact tgttaacatt tccagataat gtagaacatt gtgaaacagt 3C0 
atttggtatg ctgttaggag acaccattat tttggataat ctggatgcgg ccaatcatta 36 0 
tagaaaagag gttgttaaaa ttacacactg tcctacactg ctgaccagag atggagatcg 42 0 
aattcgaagt aatggaaagt ttgggggcct tcagaataaa gctcctccaa tggataaact 48 0 
tcggggaatg gtatttggag ctccagttcc aaaacagtgt ctgatcttag gggaacaaat 54 0 
agatcttctt cagcagtatc gttctgctgt gtgcaaacta gacagtgtga ataaggatct 600 
taacagtcaa ttagagtacc ttcgcactcc ggatatgagg aagaaaaagc aagaacttga 66 0 
tgaacatgag aaaaatctca aactaataga ggaaaaacta ggtatgactc ccatacgtaa 72 0 
gtgtaatgac tcattgcgtc attcaccaaa ggttgagacg acagattgtc cagttcctcc 78 0 
taaaagaatg agacgagaag ctacaagaca aaataggatt ataaccaaaa cagatgtatg 84 0 
agaggtgaca gagagaagag gccattggtc tcagtaagaa tgccctgctt tctgcatctc 90 0 
tgttxcagaa gaccaagagg gtgacttacc agactgagta tttctgggga caatacaagt 96 0 
acctgggcat gaatttccat ttcgattcag atgggactgg aaacaaccat tcaattttat 1020 
gaatcttact ggacattatg gatttactgg aattattcca gacattatgc cctttggttg 1080 
tcactacctt gcaaatgtgt aagaggaaaa tgtgctaatg tggcagtgac tgtaaaactg 114 0 
gcacatggca tttattaatc ctgaagaaaa gtacatgtac tatttttcag tataaatata 1200 
atgaacatgt cagaactatt tcttgaaaac ctttttatta cttttgcgtg aatttattta 1260 
acaaagatgt tttgtctttt gtgtaaggga ggttctagag gctagatgtt taattgtaaa 1320 
tatgtgagga aactcaatgc agaattcagg ataaaaattt taaaagcaca ggtatttggg 13 8 0 
aattgaaatg ttaagatacc cagaacaaca ttaaatcaat gagtgaactt gtgacagtgg 144 0 
tagcatttca aatttcaaaa gacttatcct gtgtgtgtgt gtgtgtgtgt atatatatat 1500 
atatatatat aaatatatat atataaaata ttcagcagca ccaagtttta taactattgt 1560 
ttgtttgact ttattaatac tagaatatgt agtctcagcc ttaattttac atttacatta 1620 
ttttgtaatt ttttattact atttttaagg ggttaaagag aacatacatt ctcacactag 1680 
tgtactttct ggtagaaagt tgctgcaaaa acatttgaaa tgtatattaa cctaatgtat 1740 
gtcatatata tgcctttgtg taagttcaag actattgatc tgtgaagtta ttttgtaagg 1800 
acatacattt ggtaagtaag tttgtgtccc aggaaatgta tgtgttttta aaccctttct 1860 
aaatatgcag gccattaata aataagattg tgtctca 1897 

<210> 18 
<211> 2272 
<212> DNA 

<213> Homo sapiens 

<220> - 

<223> 1534876 

<4O0> 18 

ccatgctcca ggcatacaga tgtggtttct cggctgcacc gggccaggct gcgggrgtgc 6 0 

aggcgtctgc aaagttgtgc catgtatcag cacaggcttt gagacgtctg gacccrgtcc 120 

ttcctcccgt gaggggttct tgttctttct gactcaggtg acttttcagc ccttccaatt 180 

cccctctttt tctgccctcc cctccaactc agccaaccca ggtgtgggca gtcagggagg 240 
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gagggagtgt cccaccacgt tctcagggca gcccttgact cctaagcccc ttcctccttc 300 
catcctgcat cccczcccca. tccaacctaa atgcccacag ctggggctga gctgtattcc 360 
tgtggaggga cctctgccgt gcctctctga ggtcaggctg tgctgtgtga tgggcaggct 4 20 
ttgccccagc ccacccctgg caaggtgcac ttgttttctg gtttgtacaa ggtgtcctgg 480 
gggcccgtcg cttccctgcc agtgaggagt gacttctccc tctcttccag tcctgtaggg 540 
gagacaaaac cagattgggg ggcccaaggg gagcatggaa aaggccggct cccctgtctt 600 
tccttggctg tcagagtcag ggtaacacac accaagagtg gagtgcggcc agcaagtttg 66 0 
agacctgccc gccctcctcg cagctctgct ctgtgtcctc aggaagtcac agagtctact 72 0 
gaggcaagga gagggtgatt ctttccccaa accccttctt ccctggttcc caaaccaaag 780 
acagcctgca gccctttctg catggggtgc tctgttgaca ggcttcccag atccctgagt 84 0 
ctctctttcc ttcctcctcg atctttagtt gtccacggtc aattcagtgc ttccattggg 900 
ggacagtccc ctccgggatg acctgactca cctccagccc agggaatgga atctagagga 96 0 
atacgtgggg tgggtctgga caaggagcgg caggaatcac cacccatctc cagctgtgga 1020 
gccctgtgga ggggaagggg aagcttgggg ttcagaggga actcttccag gagaggggtg 1080 
cccagcggag gcaaagatga cagagggttg tggggggtct ctagttgaat gttttggccc 114 0 
atgactttgg aacatggctg gcagcttcca gcagaagtca cgctccccat cccccagggg 1200 
acataggacc tttttcctgc ttcctggtca ctttcaaaga actatttgcg caatctgtgg 1260 
gtctgtggat tcacggggct ttctgtgtgg gtgctgcagt tgcttttgtc tgcagcagca 1320 
ggacacatct ttcctcttac tcagcccttt atggcccatg gggaactccg tggctcaggg 1380 
agagctgaac tccaggggtg tgacctggga caggtgggcc tgaggtgccc agctcagggc 144 0 
agccaggtgg ctcatgggct gtagtgagcc agctccctgg gggaaaaggc tgtgggccgt 1500 
taggaccatc ctccaggaca ggtgacctct atgaggtcac ctacggctgt ggccgtgcag 156 0 
gcctccttcc agcccagagt ggcccagtag agcaaggcag acagtgacct ccacccccgc 1620 
agccctctta aaaggccagt actcttgggg gtggggggag ggtttagaaa gcatttgccc 1680 
atctgccttt ctttccccca gcccccaccc gctttgaatg tagagacccg tgggcacttt 174 0 
tccttttgtg gtggggggtg cggaggaggt acccccaccc ctggcacagc cgcctggaat 1800 
gcaggactgt cactgctgtt cgggtgatga cctcgttgcc aagctcctcc tgtccccttg 1860 
ttctgggggc aggcgctgtg cttctgtgag gtggtttagc ttttgctttc gaagtggcca 1920 
gctgcggcca ccaggtctca gcacaagagc gcttcctttg cacagaatga gcttcgagct 1980 
ttgttcagac taaatgaatg tatctgggag gggtcggggg cacgagttga ttccaagcac 2040 
atgcccttgc tgagtgtgtg tgtgctggga gagtcagagt ggatgtagag cgcggtttta 2100 
tttttgtact gacattggta agagactgta tagcatctat ttatttagat gatttatctg 2160 
gtaaatgagg caaaaaaatt attaaaaata cattaaagat gatttaaaaa aaagaccaaa 2220 
aaaccaagaa acccaaagcc caagaatgcg cgtagcatcc aaaaaaaaaa gg 2272 



<210> 19 
<211> 992 
<212> DNA 

<213> Homo sapiens 

<220> - 

<223> 1634813 



60 



<400> 19 

gacagcttgg cctacagccc ggcgggcatc agctcccttg acccagtgga tatcggtggc 
cccgttattc gcccaggtgc ccagggagga ggacccgcct gcagcatgaa cctgtggctc 120 
ctggcctgcc tggtggccgg cttcctggga gcctgggccc ccgctgtcca cgcccaaggt 180 
gtctttgagg actgctgcct ggcctaccac taccccattg ggtgggctgt gctccggcgc 240 
gcctggactt accggatcca ggaggtgagc gggagctgca atctgcctgc tgcgatattc 30C 
tacctcccca agagacacag gaaggtgtgt gggaacccca aaagcaggga ggtgcagaga 360 
gccatgaagc tcctggatgc tcgaaataag gtttttgcaa agctccgcca caacacgcag 420 
accttccaag caggccctca tgctgtaaag aagttgagtt ctggaaactc caagttatca 480 
tcatccaagt ttagcaarcc catcagcagc agcaagagga atgtctcccc cctgatatca 540 
gctaattcag gactgtgagc cggcccattt ctgggctcca tcggcacagg aggggccgga 6 00 
nctttctccg ataaaaccgt cgccccacag acccagctgt ccccacgcct ctgtcttttg 660 
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ggtcaagtct taatccctgc acctgagttg gtcctccctc tgcaccccca ccacctcctg 720 

cccgtctggc aactggaaag agggagttgg cctgatttta agccttttgc cgctccgggg 780 

accagcagca atcctgggca gccagtggct cttgtagaga agacttagga cacctctctc 840 

actttctgtt tctcgccgtc caccccgggc catgccagtg tgtccctctg ggtccctcca 900 

aaactctggt cagttcaagg atgcccctcc caggctatgc ttttctataa cttttaaana 960 

aaccttgggg gttgatggag tcaaaaaaaa aa 992 



<210> 20 

<211> 810 

<212> DNA 

<213> Homo sapiens 

<220> - 

<223> 1711840 

<400> 20 

cgagtgagcg cgcggcggcc cctggtccgc 
tggaggcggg ggtgggggtc tgagctgcgt 
cctcttagcg gtgcgtccgg gctagcggcg 
caccttagca gcccgacttg gggcctggaa 
cacgcggccc ccggtgggga aggggacggg 
ggatgcagca ccgaggcttc ctcctcctca 
cggtcgccaa aaagcaagat aaggtgaaga 
gggcctgggg gccctgcacc cccagcagca 
ggccaccttg ggggggccca agaacccaac 
gaaagttgga aagaaaggag ttttgggcac 
ggaagaaaac ttggcgggtc tgccggtccg 
agccgaaaaa atgctttctc cgccgccaag 
aaaagggcgc gcccacactg ttaacaacaa 
gggggaggga gacacatact tgcgcgcggt 



ccggccgcgg ccgatctagg ggctgggggc 60 
cctgggctcg aggcgtcccc cggggagtcg 120 
aggggccgcc ccaagtcttc ccaccgccgc 18 0 
agtggagcac gcggaggtgg gagggccctg 24 0 
ccagggattc agactcgggc tctcccctca 3 00 
ccctcctcgc cctgctggcg cccacctccg 360 
agggcggccc ggggagcgag tgcgctgagt 42 0 
aaggatttgc ggcagtgggt tttccgcgag 4 80 
cggcagtcct ggttgaaagg gttgcccctg 540 
ccggactttg gaaagttggc caaatttttt 600 
ttaaatgggg gaggggacaa aagaattgaa 660 
agaggtcgaa cccgcgtctg gcaagaagag 72 0 
tatggcgcct gaacagttgg tggcaccaca 780 

810 



<210> 21 

<211> 1064 

<212> DNA 

<213> Homo sapiens 

<220> - 

<223> 1747327 

<400> 21 

ttcctggggc tccggggcgc ggagaagctg catcccagag gagcgcgtcc aggagcggac 6 0 
ccgggagngt ttcaagagcc agtgacaagg accaggggcc caagtcccac cagccatgca 120 
gacctgcccc ctggcattcc ctggccacgt ttcccaggcc cttgggaccc tcctgttttt 180 
ggctgcctcc ttgagtgctc agaatgaagg ctgggacagc cccatctgca cagagggggt 240 
agtctctgtg tcttggggcg agaacaccgt catgtcctgc aacatctcca acgccttctc 300 
ccatgtcaac atcaagctgc gtgcccacgg gcaggagagc gccatcttca atgaggtggc 360 
tccaggctac ttctcccggg acggctggca gctccaggtt cagggaggcg tggcacagct 420 
ggtgatcaaa ggcgcccggg actcccatgc tgggctgtac atgtggcacc tcgtgggaca 480 
ccagagaaat aacagacaag tcacgctgga cgtttcaggt gcagaacccc agtccgcccc 540 
cgacactggg ttctggcctg tgccagcggt ggtcactgct gtcttcatcc tcttggtcgc 6 00 
tctggtcatg ttcgcctggc acaggtgccg crgttcccag caacgccggg agaagaagtt 660 
cttcctccta gaaccccaga tgaaggtcgc agccctcaga gcgggagccc agcagggcct 720 
gagcagagcc ^ccgctgaac tgtggacccc agactccgag cccaccccaa ggccgctggc 730 
actggtgtcc aaaccctcac cacttggagc cctggagctg ctgtcccccc aacccttgtt 840 
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tccaracgcc gcagacccat agccgcctgc 
tgagcgccga cctttggtgg cgggggcctg 
gacaccgggc tttgcttggc aaggcttggg 
tgccgttgca gaacccctta gccccttcca 




PCT/US98/27598 

aaggaagaga ggacacagga gtagccaccc 900 
ggtctctcgt ccccacccgg aagggcacaa 960 
gcctcttgtg gtcaacccag ttcccttggg 1020 
acgtcgacca ggtt 1064 



<210> 22 

<211> 1336 

<212> DNA 

<213> Homo sapiens 



<220> - 

<223> 1864292 



<400> 22 

agctcgtacc cctcgagtga aattctgaaa 
aagaactcat tgagtccgaa gccccaccaa 
aagatggatc tgtagaactg gaatctcaag 
ttatttcttc aatgccctgc ttgttgatgg 
tagcatccac agagagtgac aagcctacaa 
atcactgcat gctttcccct tcctctagtg 
ccgcagaaga gaatgaaccc tctcaggcag 
tgtctggtgc cacagttggg cgcaagtcta 
ctatggctgc caagaaaaac cggcaatcca 
ttaaaggtca tcggagccaa aagcacaagg 
aggctgctgc aaggaagaaa tataacctgc 
tgacttgtga ctcaagcacg agctcatcag 
agacaatcac tgcagagata ccagatggac 
acaccaactc tgacccagaa gtggtaaatg 
aagagcacag taattctgta ggcggccagg 
ttctagagga gctgaatgca gaggcaggtc 
aaacatctgc tggcaatgcg ccactcaatg 
aagtagagat tgtgggagtt caggaacatg 
ttcagagtgt ttcttcatgg aagcatggct 
cacagtcatg gactgctgtg actccccagc 
accttacctt ggatgaggat agcaggcgta 
ttcccctgca ctgttccctt ccacttcctc 
ataggggnac ggagct 



tgaagatgga ggaggcagtg ggaaaagttg 6 0 
aagcatctga acaagagaca gccaaggagg 120 
ttcagaaaga tggtgtagcg gattctacag 1B0 
aactgagaag ggactcttct gagtctcagt 24 0 
ccggccgagt ttatgagagt gacccctcta 3 00 
gtcacctggc tgattcagat acgttgtctt 360 
aaacggcggt agaaggagac ccttcaggag 42 0 
ggcggtcccg atctgaaagt gaaacttcca 480 
gtgataaaca gaatggccga gtcgccaagg 54 0 
agaggatcag gctactgagg cagaaacggg 600 
tgcaggacag tagtaccagt gatagtgacc 66 0 
atgatgatga agaggtttca gggagcagca 72 0 
ctccagttgt: agctcattat gatatgtctg 780 
tggacaattt attggcggct gcagtagttc 84 0 
acacaggagc tacctggagg accagcgggc 900 
atttggatcc aggattccta gcaagtgaca 960 
aagaaattaa cattgcgtct tcagatagtg 1020 
caaggtgtgt tcatcctcga ggtggtgtga 1080 
cgggcacgca gtatgttagc accaggcaaa 114 0 
agacttgggc ttcaccagca gaagttgttg 1200 
aacacctact gtaatacaat gtcactgtgn 1260 
atcctctttg tgacatggaa gttcattgtc 1320 

1336 



<210> 23 

<211> 1742 

<212> DNA 

<213> Homo sapiens 



<220> - 

<223> 1866437 



<400> 23 

gccccgcccc ctccccgccc gccttcccgg 
ggcccctgcg gcggcggcgg gatgttcgtg 
crgcggctgc acatctgcgc gtccgacggc 
acctcggtgg agaagctcaa ggagcgctgc 
gatcccaaaa gtataaccca ncataaatta 
gatgccagga ccatcctgga agagaacatc 
aagcgtgctc catcaccact tcccaagatg 



tgaccttcag gggcccgggt ggcgggcgca 6 0 
caggaggaga agarcttcgc gggcaaggtg 12 0 
gccgagtggc tggaggaggc caccgaggac 180 
ctcaagcact gtgctcatgg gagcttagaa 240 
atccacgctg cctcagagag ggtgctgagt 300 
caggaccaag atgtcc^act attgaaaaaa 36 0 
gctgatgtct cagcagaaga aaagaaaaaa 42G 
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caagaccaga aaactccaga taaagaggcc atactgcggg ccaccgccaa cctgccctcc 480 
tacaacatag accgggccgc ggnccagacc aacatgagag actcccagac agaactccgg 54 0 
aagatactcgg tgtctctcat cgaggtggcg cagaagctgt tagcgctgaa cccagatgcg 600 
g tgcraat:tgt ttaagaaggc gaatgcaatg ctggacgagg acgaggatga gcgtgtggac 660 
gaggctgccc tgcggcagct cacggagatg ggctttccgg agaacagagc caccaaggcc 72 0 
cttcagctga accacatgtc ggtgcctcag gccatggagt ggctaattga acacgcagaa 78 0 
gacccgacca tagacacgcc tcttcctggc caagctcccc cagaggccga gggggccaca 84 C 
gcagctgcct ccgaggctgc cgcgggagcc agcgccaccg atgaggaggc cagagatgag 90 0 
ccgacggaaa tcttcaagaa gatccggagg aaaagggagt ttcgggctga tgctcgggcc 960 
gtcatttccc tgatggagat ggggttcgac gagaaagagg tgatagatgc cctcagagtg 1020 
aacaacaacc agcagaatgc cgcgtgcgag tggctgctgg gggaccggaa gccctctccg 1080 
gaggagctgg acaagggcat cgaccccgac agtcctctct ttcaggccac cctggataac 1140 
ccggtggtgc agctgggcct gaccaacccg aaaacattgc tagcatctga agacatgctg 1200 
gagaacccac tgaacagcac ccagtggatg aatgatccag aaacggggcc tgtcatgctg 126 0 
cagatctcta gaatcttcca gacactaaat cgcacgtagg tggcgttgtt ccactcggct: 1320 
atcaggccac agcagccccc tggtgcggcc cgagaccggg cagagtggac ctcacctgga 1380 
aactcacctt cagcgcctca gccctggact gttagaggtg ctgcagctgc tcctgctctc 1440 
tgatcttatt gcttataaac tttggtgacg gtagtgtgta aggccgtatt tttagcatct 1500 
gacaggtgtt tacaaaaaag tggttgtcgc actgggaagt ggagtgatgg cctcgtctcc 156 0 
agtgctcctc tgggctcttg agttgctgct tgaattgccg tgtagacatt tgcttggaga 1620 
gtccacttgt tatttgacgg aggtaggttt caacccagag ttaatgtcaa gcatgctaat 1680 
ttaactagtc actcacagat gacttttctt taataaagtc ccttttccta ttaaaaaaaa 174 0 

1742 



<210> 24 

<211> 1074 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> unsure 

<222> 546, 548, 573, 577, 588, 610, 615, 618, 647, 658, 676, 690, 
<221> unsure 

<222> 708, 718, 730, 738, 741, 770, 775, 780, 783, 803, 813, 818, 
<221> unsure 

<222> 819, 839, 844, 864, 870, 890, 901, 903, 918, 921, 943, 955, 
<2 21> unsure 

<222> 960, 957, 975, 990, 997, 999, 1000, 1007, 1040, 1053, 1055, 
<221> unsure 
<222> 1065, 1071 

<223> a or g or c or t, unknown, or other 



<220> - 

<223> 1871375 



<400> 24 

gcggtgcaga ggaagcacaa cctc-accgg gacagcatgg tcatgcacaa cagcgacccc 60 

aacctgcacc tgctggccga gggcgccccc atcgactggg gcgaggagta cagcaacagc 120 

ggcgggggcg gcagcccagc cccagcaccc cggagtcagc caccctctcg gaaaagcgac 180 

ggcacgccaa gcaggtggtc tctgtggtcc aggatgagga ggtggggctg ccctttgagg 240 

ctagccctca gtcaccacca cctgcgtccc cggacggtgt cactgagatc cgaggcctgc 300 

tggcccaagg tctgcggcct gagagccccc caccagccgg ccccctgcrc aacggggccc 360 

ccgctgggga gagtccccag cctaaggccg cccccgaggc ctcctcgccg cctgcctcac 420 

ccctccagca tctcctgcct ggaaaggctg tggaccttgg gccccccaag cccagcgacc 46 0 

aggagactgg agagcaggtg tccagcccca gcagccacc- cgc:ctccac accaccaccg 540 
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aggacnantt tcaaggggtg caagaattga agnttcntaa gggccaantt gggggtcccc 600 

ttgactnggn ttggnaanat tggggcaaaa agggccggtt ttccccnttt cccggganac 660 

cccaagggaa aggggnttca aagcttcttn gggggggaaa gggggaancc ctcgggtntt 720 

ttgttggccn tttgtganca ncagcgagga gagtgcaaag gtgcagagtn agttntaggn 780 

cantgggtcc ctgactgctg canatggtaa ggncgttnnc ttgrggaccc aaggcaggna 8 40 

aagntgtggg gagggaagct ggtntgtgcn ttgtgggtgg aagcggggan ggctgtgttg 900 

nanggcaggg agagggcnaa ntgagttatt tattggggtt cangtgaaaa gtttcttgnn 96 0 

ccctgtnttg tgttnctgtg ggattgattn taagatngnn aggggtnggt ttttggggtt 1020 

ttcctggttg gtggccaaan gggttggaaa atngntgggg ggggnttgga naat 1074 



<210> 25 

<211> 1454 

<212> DNA 

<213> Homo sapiens 

<220> - 

<223> 1880830 

<400> 25 

cccgggggag gcctgacccc ctccgcacca 
gaggggcatc cagccgtgtt gcctggggag 
ccggacagtg ggagtgcccc tatgatcacc 
gaaggcaaga tgcatcagca tgtagtcaaa 
aagaatgcac ccccagggaa aaaatatgtt 
aaagtgacat cccaacggat tgcatgccct 
gggcctgngc atcccggacc tctgagtcca 
tgtggacatt gcaagaatac ttttctgtgg 
tgtcctcact gcaggaaagt gtcatctatt 
tgctgcttct tgcttggctt gcttttggca 
tggaagcatg cacggcgata tggaggcatc 
gctgtgctgt gtttgggccg ggctctttat 
cagaacttct cctgagcctg atgacccaca 
agtgacacta cgaagggagc tggggtagtt 
aagcagctgc cttccttttc cctggggaga 
ttggaggggc agataagagc actgctgacc 
agggtgaagt aggcaaaacg ttgcccttaa 
tccttcctgt gtgctccctg agagccattc 
gggtgggtag ccctgggggt tcccctccct 
attgcacttc accagaggtt ggctctggcc 
gtgtcctgtg ggggtggggt cagccgctgc 
tataactatt taatgtggga tatgttcccc 
cgaccttttt taccccccca gttgcagtat 
ttgggggagg ttagggactt atcctgtgct 
aaaaaaaagg gcgg 



ccgtacggag ccgcatttcc cccgtttccc 60 
gacccacccc cctattcacc cttaactagc 12 0 
tgccgagtct gccaatctct catcaacgtg 18 0 
tgtggtgtct gcaatgaagc caccccaatc 24 0 
cgatgcccct gtaactgtct ccttatctgc 300 
cggccctact gcaaaagaat catcaacctg 3 60 
gaaccccaac ccatgggtgt cagggttatc 420 
acagagttca cagaccgcac tttggcacgt 480 
gggcgcagat acccacgtaa gagatgtatc 54 0 
gtcactgcca ctggccttgc ctttggcaca 600 
tatgcagcct gggcatttgt catcctgttg 660 
tgggcctgta tgaaggtcag ccaccctgtc 720 
gactgtgcct ggcccctccc tggtggggac 7 80 
aaaggctccc ggggcttcta gaaggaagcc 840 
ggtaggaagg aaccaggccc tcacttaggt 9 00 
atctgctttc ctccaagggt tgctgtgtct 960 
aactgggccc tgaagacggt tccagccttg 102 0 
ctgtccctta cacattccag ggcagggtgg 1080 
cttgtgcacc attaggactt tgctgctgct 1140 
tcagtacccrt cagtctcctc tccccacatt 120C 
tctgtacaga accacaggaa ctgatgtgta 1260 
tattcctgta tttcccttaa ttccccctcc 132C 
ttaactgggc tgggtagggt tgctcagtct 1380 
tgtaaataaa taaggtcatg actctaaaaa 144 0 

1454 



<210> 26 

<211> 1121 

<212> DNA 

<213> Homo sapiens 

<220> - 

<223> 2328134 

<400> 26 
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gcgggggatg acgccacgga catggtggcc gagaccggcg gggtggggga cgtgtcgcgc 6 0 

gcrccgggtgg cctcggtcgg taccctigggc gcggacagct gcctcattag tattcgtacc 120 

cacgaggcgg cgcagcgggc ccccggggac agcgagcgtc gcggcratgg cttatcactc 18 0 

gggctacgga gcccacggct ccaagcacag ggcccgggca gccccggatc cccctcccct 24 0 

cctcgatgac acaagcggtg gttatnccag ccagcccggg ggatacccag ccacaggagc 300 

agacgtggcc ttcagtgtca accacttgct tggggaccca atggccaatg tggctacggc 36 0 

ctatggcagc tccatcgcac cccatgggaa ggacatggtg cacaaggagc tgcaccgttt 42 0 

tgtgtctgtg agcaaactca agtatttttt tgctgtggac acagcctacg tggccaagaa 48 0 

gctagggctg ctggtcttcc cctacacaca ccagaactgg gaagtgcagt acagtcgtga 54 0 

tgctcctctg cccccccggc aagacctcaa cgcccctgac ctctatatcc ccacgatggc 6 00 

cttcattact tacgrgctcc tggctgggat ggcactgggc attcagaaaa ggttctcccc 660 

ggaggtgctg ggcctgtgcg caagcacagc gctggtgtcg gtggtgatgg aggtgctggc 72 0 

cctgctcctg ggcctctacc tggccaccgt. gcgcagtgac ctgagcacct ttcacctgct 780 

ggcctacagt ggctacaaat acgtgggaat gatcctcagt gtgctcacgg ggctgctgtt 84 0 

cggcagcgat ggctactacg tggcgctggc ctggacctca tcggcgctca tgtacttcat 90C 

tgtgcgctct ttgcggacag cagccctggg ccccgacagc atggggggcc ccgtcccccg 96 0 

gcagcgtctc cagctctacc tgactctggg agctgcagcc ttccagcccc tcatcatata 1020 

ctggctgact ttccacctgg tccggtgacc ccctggcccc agatggcact gagtttttca 1080 

ttcattgaag atttgatttc cttgaaaaaa aaaaaaaaag g 1121 



<210> 27 

<211> 1229 

<212> DNA 

<213> Homo sapiens 

<220> - 

<223> 2652271 

<400> 27 

ctctctgctc cggtgcaggc ccgcaggcgc cctgggctgg gagcaacgcg actgaccgtg 6 0 
gtcgtgggcg gacggcggct gcagcgtgga ggagctgggg tcgctgtggg tcgcgaacag 12 0 
agcccgggac gtgcgcgctt ggtgcacgat cctgaagggg agctccgagg ggcccgggtc 18 0 
tccagggctg ctgcggccat tcccggagcc cggcgcgggg cccgcgagat actggtttag 24 0 
gccgtcccag ggctccgggc gcacccggtg gccgctgctg cagcggaggg agcgcggcgg 3 00 
cgcgggggct cggagacagc gtttctcccg gaagtcttcc tcgggcagca ggtgggaagt 36 0 
gggagccgga gcggcagctg gcagcgttct ctccgcaggt cggcaccatg cgccctgcag 42 0 
ccctgcgcgg ggccctgctg ggcxgcccct gcctggcgtt gctttgcctg ggcggtgcgg 4 80 
acaagcgcct gcgtgacaac catgagtgga aaaaactaat tatggttcag cactggcctg 54 0 
agacagtatg cgagaaaatt caaaacgact gtagagaccc tccggattac tggacaatac 60 0 
atggactatg gcccgataaa agtgaaggat gtaat agate gtggcccttc aatttagaag 66 0 
agattaagga tettttgeca gaaatgaggg catactggcc tgaegtaatt cactcgtttc 72 0 
ccaatcgcag ccgcttctgg aagcatgagt gggaaaagca tgggacctgc gccgcccagg 78 0 
tggatgeget caactcccag aagaagtact ttggcagaag cctggaactc tacagggagc 84 0 
tggacctcaa cagtgtgctt ctaaaattgg ggataaaacc atccatcaa: tactaccaag 90 0 
ttgcagattt taaagatgee ettgecagag tatatggagt gatacccaaa atccagtgcc 96 C 
ttccaccaag ccaggatgag gaagtacaga caattggtca gatagaactg tgcctcacta 1C20 
agcaagacca geagctgeaa aactgcaccg ageeggggga gcagccgccc cccaagcagg 1080 
aagtctggct ggcaaatggg gccgccgaga geeggggtet gagagtctgt gaagatggcc 1140 
cagtcttcta tcccccacct aaaaagacca agcatrgatg cccaagtttt ggaaatattc 1200 
tgttttaaaa agcatgaggt aggcatgee 1229 



<210> 28 
<211> 2295 
<212> DNA 



2 1 24 
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<213> Hono sapiens 

<220> - 
<223> 2965248 



<400> 28 

gtctgcagct ccggccgcca cttgcgcctc 
caccatggcc agcaccattt ccgcctacaa 
gctcatctgc tcctgcttct acacacagcc 
catggaggtg aagcagctgg acaagcgggc 
gtccccttct gacctgtccc cagagagccc 
cacctccctg gaggagctgc aaaagcggct 
ggaggcgcag gtgctgaagc agctggcgga 
aaggcgctgg aggagaataa caacttcagc 
atggagctca gcaaggagat ccgcgaggca 
gagaaggagc tgcacgcggc cgaggtgcgc 
ggctaagggc ccgggacggg cggcgcccat 
ttgtttcgtt cacctctgtc tagatgcaac 
agcttcatgc ttctcttccg cactcagccg 
ccacggcttc ccctgcagga gccgccgggc 
ggccgggcgc ggctgggtcc cccgggggcc 
acccagggcg gtggcgtggg atcgcgggtc 
aggtcagggc aggtcctctg agccggcgcc 
ctgtctttcc agggggaagg ggctccccat 
ggtgcctggg agcctgcgcg tgcagccggt 
ggctgcagct ttccttaatg tggttgcaca 
ggacaccctg ggccttcctg gaagcctgca 
gtgggcattc tctgccaggg acccatgagc 
atggacagtc ccccactcag aagtgcaaga 
cgtgggacag ccccgccgcc cctccccacc 
ccctagagcc ctttggagtg ctggcccctc 
ccacaagtcc tcctcaggga gccccaaggg 
aggtgttgga agggcagcgg gtaaggttcc 
cacccagagg gggctgtggg tggaggcctg 
ggctgagttc cttctttccc ttggacgccc 
ggatggcggt gggggaggct gtctttgtac 
gccccatccc aaagctgctg cctggcccct 
tctcttagga cccagagcca gggccctcaa 
actgccagtg tcttccagag ccacacccag 
gctcaggggt cagcagggac ccactgcccc 
aaggagcagc cagctgggat gggaacccaa 
agaaagggaa gcagaactga gggctgggat 
agcctactgt aatatgcacc catctcatcc 
aaatgaacaa ttaaataaac acctgtgtgt 
aaaaaaaggg gcggt 



tccagcctcc gcaggcccaa ccgccgccag 60 
ggagaagatg aaggagctgt cggtgctgtc 120 
gcaccccaat accgtctacc agtacgggga 130 
ctcaggccag agcttcgagg tcatcctcaa 24 0 
tatgctctcc tccccaccca agaagaagga 3 00 
ggaggcagcc gaggagcgga ggaagacgca 3 60 
cggcgcgagc acgagcgcga ggtigctgcac 420 
cgccaggcgg aggagaagct caactacaag 48 0 
cacctggccg caccgcgcga gcggctgcgc 540 
aggaacaagg agcagcgaga agagatgtcg 6 00 
cctgcgacgg aacacgttcg ggttttggtt 660 
ttttgttcct cctcccccac cccagccccc 720 
ccctgccctg tcctcgtggt gagtcgctga 780 
gtgagacgcg gtccctcggt gcagacacca 84 0 
ctgtgagaga ggtggtggtg accgtggtaa 900 
cttacgctgg gctgtctggt cagcacgtgc 960 
cccggccagc aggcgaggct acagtacctg 1020 
gagggagggg cgacggggga ggggggtgat 10 80 
gcttgttgaa ctggcaggcg ggtgggtggg 1140 
ggggtcctct gagaccacct ggcgtgaggt 1200 
gttgggggcc tgccctgagc ctgctgggga 1260 
aggctgcatg gtctagaggt tgtgggcagc 132 0 
gttccaaaga gcctctggcc caggcccctc 1380 
agggctttgc agatgtcctt gaaagaccca 1440 
ctgtgccctc tgccctggtg gaagcggcag 1500 
ggattttgtg ggaccgctgc ccacagatcc 156 0 
caagccagcc ccaacaccct tcccacttgg 1620 
actccaggcc tctcctgccc acaccctctg 1680 
agtgctggcc ttggaggacg gtcagccgga 174 0 
cactgcagca tcccccactt ctccacggaa 1800 
tgctgtaaag tgtgaagggg gcggctgagt 1860 
cttccatcct gcgggaggcc ttggccgggc 1920 
ggaccacggg aggatcctga cccctgcagg 1980 
atctccctct ccccaccaag acagccccag 2040 
ggctgtccac atctggcttt tgtgggactc 2100 
attcctcatg gtggcagcgc tcatagcgaa 2160 
acgtagtaaa gtgaacttaa aaattcaatc 2220 
ttaagacaaa ataaaaatgg aggagaacaa 22 80 

2295 



<210> 29 

<211> 2215 

<212> DNA 

<213> Home sapiens 

<220> - 

<223> 305766S 

<400> 29 
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cccacgcgtc cgcccacgcg tccgtnttca gtagggatt n cccgtgacca gacaagttca €0 
tctgacragcc agttctcacc actggaattc tcaggaatgg accatgagga catcagtgag 12 0 
tcagtggatg cagcatacaa cctccaggac agttgcctta cagactgtga tgtggaagat 180 
gggactatgg atggcaatga tgaggggcac tcctttgaac tttgtccttc tgaagcttct 240 
ccttatgtaa ggtcaaggga gagaacctcc tcttcaatag tatttgaaga ttctggctgt 300 
gacaatgctt ccagtaaaga agagccgaaa actaatcgat tgcatartgg caaccattgt 360 
gctaataaac taactgcttt caagcccacc agtagcaaat cttcttctga agctacattg 420 
tctatttctc ctccaagacc aaccacttta agtttagatc tcactaaaaa caccacagaa 480 
aaactccagc ccagttcacc aaaggtgtat ctttacattc aaatgcagct gtgcagaaaa 54 0 
gaaaacctca aagactggat gaatggacga tgtaccatag aggagagaga gaggagcgtg 6 00 
tgtctgcaca tcttcctgca gatcgcagag gcagtggagt ttcttcacag taaaggactg 660 
atgcacaggg acctcaagcc atccaacata ttctttacaa tggatgatgt. ggtcaaggtt 720 
ggagactttg ggttagtgac tgcaatggac caggatgagg aagagcagac ggttctgacc 780 
ccaatgccag cttatgccag acacacagga caagtaggga ccaaactgta tatgagccca 840 
gagcagattc atggaaacag ctattctcat aaagtggaca tcttttcttt aggcctgatt 900 
ctatttgaat tgctgtatcc attcagcact cagatggaga gagtcaggac cttaactgat 960 
gtaagaaatc tcaaatttcc accattattt actcagaaat atccttgtga gtacgtgatg 102 0 
gttcaagaca tgctctctcc atcccccatg gaacgacctg aagctataaa catcattgaa 108 0 
aatgctgcat tcgaggactt ggactttcca ggaaaaacag tgctcagaca gaggtctcgc 114 0 
tccttgagtt catcgggaac aaaacattca agacagtcca acaactccca tagccctttg 120 0 
ccaagcaatt agccttaagt tgtgctagca accctaatag gtgatgcaga taatagccta 126 0 
cttcttagaa tatgcctgtc caaaattgca gacttgaaaa gtttgttctt cgctcaattt 132 0 
ttttgtggac tacttttttt atatcaaatt taagctggat ttgggggcat aacctaattt 1380 
gagccaactc ctgagttttg ctatacttaa ggaaagggct atctttgttc tttgttagtc 144 0 
tcttgaaact ggctgctggc caagctttat agccctcacc atctgcctaa ggaggtagca 150 0 
gcaatcccta atatatatat atagtgagaa ctaaaatgga tatattttta taatgcagaa 156 0 
gaaggaaagt ccccctgtgt ggtaactgta ttgttctaga aatatgcttt ctagagatat 162C 
gatgattttg aaactgattt ctagaaaaag ctgactccat ttt tgtccct ggcgggtaaa 1680 
ttaggaatct gcactatttt ggaggacaag tagcacaaac tgtataacgg tttatgtccg 1740 
tagttttata gtcctatttg tagcattcaa tagctttatt ccttagatgg ttctagggtg 1800 
ggtttacagc tttttgtact tttacctcca ataaagggaa aatgaagctt tttatgtaaa 186 0 
ttggttgaaa ggtctagttt tgggaggaaa aaagccgtag taagaaatgg atcatatata 1920 
ttacaactaa cttcttcaac tatggacttt ttaagcctaa tgaaatctta agtgtcttat 1980 
atgtaatcct gtaggttggt acttccccca aactgattat aggtaacagt ttaatcatct 2040 
cacttgctaa catgttttta tttttcactg taaatatgtt tatgttttat ttataaaaat 2100 
tctgaaatca atccatttgg gttggtggtg tacagaacac acttaagtgt gttaacttgt 2160 
gacttctttc aagtctaaat gatttaataa aactttttt t aaattaaaaa aaaaa 2215 



<210> 30 

<211> 2060 

<212> DNA 

<213> Homo sapiens 

<220> - 

<223> 3125156 

<400> 30 

tccccccctc agcctccccc ccccccacrg 
atgggccccc aggcagcccc tcttaccatt 
cctagtcccc acctggtgcc ttcacctgcc 
cgccccccag cagcagaacc acccccttgc 
ctctcctcca gcccggagag ccagcatggc 
ctgctgcagc ccaccaaggt ggatgcagct 
attgagcggg acccctatga gcatcctgag 
gcctttcggg gtcagctggg ggatgtggga 



gcatatggtc ctgccccttc taccagaccc 60 
cgagggccct cgtctgctgg ccagtccacc 120 
ccatctccag ggcctggtcc ggtaccccct 180 
ctgcgccgag gcgccgcagc tgcagacctg 24C 
ggcactcagt ctcctggggg tgggcagccc 300 
gagggtcgtc ggccgcaggc cctgcggctg 360 
aggctgcggc agttgcagca ggagctggag 42 0 
gctctggaca ctgtctggcg agagctgcaa 48C 
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gatgcgcagg aacatgatgc ccgaggccgt tccatcgcca ttgcccgctg ctactcactg 540 
aagaaccggc accaggatgt catgccctat gacagcaacc gtgtggtgct gcgctcaggc 6 00 
aaggatgact acatcaatgc cagctgcgtg gaggggctct ccccatactg ccccccgcta 660 
gtggcaaccc aggccccact gcctggcaca gctgctgact tctggctcat ggtccatgag 720 
cagaaagtgt cagtcattgt catgctggtt tctgaggctg agatggagaa gcaaaaagtg 780 
gcacgctact tccccaccga gaggggccag cccatggtgc acggtgccct gagcctggca 840 
ttgagcagcg tccgcagcac cgaaacccat gtggagcgcg tgctgagcct gcagttccga 900 
gaccagagcc tcaagcgctc tcttgtgcac ctgcacttcc ccacttggcc tgagttaggc 960 
ctgcccgaca gccccagcaa cttgctgcgc ttcatccagg aggtgcacgc acattacctg 1020 
catcagcggc cgctgcacac gcccatcatt gtgcactgca gctctggtgt gggccgcacg 1080 
ggagcctttg cactgctcta tgcagctgtg caggaggtgg aggctgggaa cggaatccct 1140 
gagctgcctc agctggtgcg gcgcatgcgg cagcagagaa agcacatgct gcaggagaag 1200 
ctgcacctca ggttctgcta tgaggcagtg gtgagacacg tggagcaggt cctgcagcgc 1260 
catggtgtgc ctcctccatg caaacccttg gccagtgcaa gcatcagcca gaagaaccac 1320 
cttcctcagg actcccagga cctggtcctc ggtggggatg tgcccatcag ctccatccag 13 80 
gccaccattg ccaagctcag cattcggcct cctggggggt tggagtcccc ggttgccagc 1440 
ttgccaggcc ctgcagagcc cccaggcctc ccgccagcca gcctcccaga gtctacccca 1500 
atcccatctt cctcccaaac cccctttcct ccccactacc tgaggctccc cagcctaagg 156 0 
aggagccgcc agtgcctgaa gcccccagct cggggccccc ctcctcctcc ctggaattgc 162 0 
tggcctcctt gaccccagag gccttctccc tggacagctc cctgcggggc aaacagcgga 168 0 
tgagcaagca taactttctg caggcccata acgggcaagg gctgcgggcc acccggccct 174 0 
ctgacgaccc cctcagcctt ctggatccac tctggacact caacaagacc tgaacaggtt 1800 
ttgcctacct ggtccttaca ctacatcatc atcatctcat gcccacctgc ccacacccag 1860 
cagagcttct cagtgggcac agtctcttac tcccatttct gctgcctttg gccctgcctg 1920 
gcccagcctg cacccctgtg gggtggaaat gtactgcagg ctctgggtca ggttctgctc 198 0 
ctttatggga cccgacattt ttcagctctt tgctattgaa ataataaacc accctgttct 2040 
gtgaaaaaaa aaaaaaaaag 2 06 0 



24/24 



